Block A · L0 Encoder

Byte Block

Any byte in, any byte out — bit-exact. One tiny circuit that encodes and decodes.

FROZEN LOSSLESS ±1 BINARY v5.0 BETA

8 bits input

16-d output

C19 activation

±1 weights only

256 / 256 lossless

3 KB footprint

What it does

The encoder stamps each of the 256 possible byte values into a 16-number fingerprint. The same circuit — same weights, run backwards — recovers the original byte. Once trained, you ship the lookup table, not the network.

Why it matters

Every wire is just −1 or +1. No multiplications anywhere — the whole encoder is a sign-counting machine: adds and subtracts of ±1 signals.

Reproduce

$ python tools/build_byte_unit.py

Find the weights

{ } weights JSON 6.5 KB # int8 LUT 4 KB // C header 30 KB

Same weights, reversed. One circuit — encode turns a byte into a 16-d latent, decode runs it backwards to recover the byte. Like a speaker that also works as a microphone.

02 · Exhaustive Dictionary

Every byte, precomputed

All 256 possible bytes with their 16-dimensional latents. Click any cell to inspect the fingerprint the encoder emits for that byte — or search by character, name, or hex code.

256 / 256

Control Printable Extended

03 · Why this setup won

The sweep that picked the champion

Four axes of search — activation · weight precision · hidden dim · LUT readout. Each tile below is a configuration that was actually run. Only one point lands at 100% lossless with the smallest footprint.

Activation × weight precision

min hidden-dim that holds 100% lossless

tanh · 2-bit H = 12

tanh · ternary H = 32

C19 · binary ★ H = 16 · champion

ReLU · binary 93.4% @ H=16 · 17 bad

Weight precision (C19 · H=16)

stepping down from full precision

fp32 baseline · not a candidate

int4 ✓ 100% lossless

3-bit {±1,2,4,8} ✓ 100% · 1.4s

ternary {−1,0,+1} 96.9% · 8 bad

2-bit {±1,±3} 91.4% · 22 bad

binary (±1) ★ 100% · smallest

Hidden dim (C19 · binary)

narrowing toward the smallest lossless · best-of-4 seeds

H = 8 25.0% · too small

H = 12 91.4% · 22 bad

H = 16 ★ 100% · champion · min

H = 20 ✓ 100% · slack

LUT readout precision

champion weights baked into a table

int8 ★ 256/256

int4 247 / 256 (96%)

1-bit sign 179 / 256

int2 / int1 ~ 41 / 256

Full-binary attempt

binary weights and binary activations

H = 24 32 / 256 · diverges

H = 64 worse · ~16 / 256

H = 256 2 / 256 · collapses

Champion

C19 · binary (±1) · H = 16 · int8 LUT

256 / 256 lossless ≈ 3 KB one-script reproduce

04 · Try it live

Type anything — watch it become 16 numbers

The only operation the Byte Block does at inference time is a 256-entry table lookup. Every byte maps to a fixed 16-dimensional latent. Type below to see each byte resolved in real time.

0 bytes

Byte sequence

0 bytes 0 unique dominant: —

05 · What’s next

Beyond the Byte Block

Block A is frozen and lossless. The frontier is now the merger layers above it and open architectural questions that the sweep left unanswered.

Active frontier

L1 Merger byte pairs → one latent

The layer above Block A. Two adjacent byte latents are compressed into a single merged latent. 2026-04-19 breakthrough: single-W mirror tied at H=81 reaches 100% lossless on all 2592 cells at half the footprint of the old recipe, overturning the “73% hard ceiling” as a single-seed artifact.

tools/diag_byte_single_w_huffman_pack.py

Exploration phase

L2 Merger pairs of pairs → one latent

A second merger layer stacked on top of L1 pairs. Currently at the geometry-probe phase. Early signal shows the L1 merged latents carry enough structure to compress further, but the winning recipe is still being searched.

tools/diag_byte_l2_phase0_geometry_probe.py

Experimental · active

Word / Subword Tokenizer V1 skip the byte level entirely

A parallel direction: tokenize directly into whole words or subword pieces, the standard LLM approach. V1 on FineWeb-EDU shows 52% raw word coverage; 34.6% of the corpus falls back to byte encoding.

tools/diag_word_unit_sweep.py

Open questions

Full-binary weights AND activations. The champion uses binary (±1) weights with a C19 float activation. True full-binary collapses reconstruction (see Slide 3). Is there a hidden-width or architecture tweak that unlocks it?

Integer-only inference path. The LUT readout is already int8, but the C19 activation still needs float. Can the whole forward pass become integer arithmetic for microcontroller deploy?

Cross-modal bytes. Block A was trained on text byte distributions. Does the same recipe work on audio bytes, image bytes, or arbitrary binary streams without retraining?