Same weights, reversed. One circuit — encode turns a byte into a 16-d latent, decode runs it backwards to recover the byte. Like a speaker that also works as a microphone.
↓ scroll
02 · Exhaustive Dictionary
Every byte, precomputed
All 256 possible bytes with their 16-dimensional latents. Click any cell to inspect the
fingerprint the encoder emits for that byte — or search by character, name, or hex code.
256 / 256
Control
Printable
Extended
03 · Why this setup won
The sweep that picked the champion
Four axes of search — activation · weight precision · hidden dim · LUT readout.
Each tile below is a configuration that was actually run. Only one point lands at
100% lossless with the smallest footprint.
Activation × weight precision
min hidden-dim that holds 100% lossless
tanh · 2-bitH = 12
tanh · ternaryH = 32
C19 · binary★ H = 16 · champion
ReLU · binary93.4% @ H=16 · 17 bad
Weight precision (C19 · H=16)
stepping down from full precision
fp32baseline · not a candidate
int4✓ 100% lossless
3-bit {±1,2,4,8}✓ 100% · 1.4s
ternary {−1,0,+1}96.9% · 8 bad
2-bit {±1,±3}91.4% · 22 bad
binary (±1)★ 100% · smallest
Hidden dim (C19 · binary)
narrowing toward the smallest lossless · best-of-4 seeds
H = 825.0% · too small
H = 1291.4% · 22 bad
H = 16★ 100% · champion · min
H = 20✓ 100% · slack
LUT readout precision
champion weights baked into a table
int8★ 256/256
int4247 / 256 (96%)
1-bit sign179 / 256
int2 / int1~ 41 / 256
Full-binary attempt
binary weights and binary activations
H = 2432 / 256 · diverges
H = 64worse · ~16 / 256
H = 2562 / 256 · collapses
Champion
C19 · binary (±1) · H = 16 · int8 LUT
256 / 256 lossless≈ 3 KBone-script reproduce
04 · Try it live
Type anything — watch it become 16 numbers
The only operation the Byte Block does at inference time is a 256-entry table lookup.
Every byte maps to a fixed 16-dimensional latent. Type below to see each byte resolved in real time.
0 bytes
Byte sequence
Start typing to see bytes
0 bytes0 uniquedominant: —
05 · What’s next
Beyond the Byte Block
Block A is frozen and lossless. The frontier is now the merger layers above it and
open architectural questions that the sweep left unanswered.
Active frontier
L1 Merger
byte pairs → one latent
The layer above Block A. Two adjacent byte latents are compressed into a single
merged latent. 2026-04-19 breakthrough: single-W mirror tied at
H=81 reaches 100% lossless on all 2592 cells at half the footprint of the old
recipe, overturning the “73% hard ceiling” as a single-seed artifact.
tools/diag_byte_single_w_huffman_pack.py
Exploration phase
L2 Merger
pairs of pairs → one latent
A second merger layer stacked on top of L1 pairs. Currently at the
geometry-probe phase. Early signal shows the L1 merged latents carry enough
structure to compress further, but the winning recipe is still being searched.
tools/diag_byte_l2_phase0_geometry_probe.py
Experimental · active
Word / Subword Tokenizer V1
skip the byte level entirely
A parallel direction: tokenize directly into whole words or subword pieces,
the standard LLM approach. V1 on FineWeb-EDU shows 52% raw word coverage;
34.6% of the corpus falls back to byte encoding.
tools/diag_word_unit_sweep.py
Open questions
?
Full-binary weights AND activations.
The champion uses binary (±1) weights with a C19 float activation. True
full-binary collapses reconstruction (see Slide 3). Is there a hidden-width
or architecture tweak that unlocks it?
?
Integer-only inference path.
The LUT readout is already int8, but the C19 activation still needs float.
Can the whole forward pass become integer arithmetic for microcontroller deploy?
?
Cross-modal bytes.
Block A was trained on text byte distributions. Does the same recipe work on
audio bytes, image bytes, or arbitrary binary streams without retraining?