← VRAXION
VALIDATED ALT · v5.0.0-β.2
7

Block B · L1 Merger · Variant

Native 7-bit Identity Merger

An alternative L1 deploy candidate: linear identity autoencoder, H=120 hidden, 7-bit integer weights, 3-bit biases, one fp32 scalar α. Packs to 3,421 B natively — no Huffman decode step, yet still beats the packed champion by ~0.55%. 100% lossless across all 65,536 byte pairs.

VALIDATED ALTERNATIVE · Cluster 18
Lineage · Cluster 11 12 13 18

Variant · Architecture

Identity linear — no C19

L0-A out
16
+
L0-B out
16
Hidden
120
Reconstruct
32

y = (x · W + b1) · WT + b2, then scaled by α. No activation — just a linear identity map through the 7-bit weight field.

Two L0 outputs are concatenated into a 32-dim input, projected through a single shared W matrix (32×120) into a 120-dim hidden, then mirrored back. Unlike the Block B champion — which uses H=81 + C19 activation — this variant widens the hidden to H=120 and drops the activation entirely. The tradeoff: more cells, but every cell quantizes cleanly to 7-bit integer without the C19 aux paramters (c, rho) fighting the quantizer.

Shape
32 → 120 → 32
Weight matrix
1 × (32×120)
Weight cells
3,840
Activation
identity (none)
W precision
7-bit int
Bias precision
3-bit int
Alpha
fp32 scalar
Lossless on
65,536 / 65,536

Variant · Pipeline position

Where this sits in the byte-level stack

This variant is not a separate layer — it is an alternative artifact for the same slot as the Block B champion. L0 reads one byte at a time; L1 (either champion or variant) takes two consecutive L0 outputs and returns one merged 32-dim latent. Downstream layers (Tokenizer → Embedder → Brain) are identical across both L1 artifacts.

Layer Block A · L0 Block B · champion Block B · variant
Input / output 1 byte → 16-dim 2×16 → 32-dim 2×16 → 32-dim
Hidden H = 16 H = 81 H = 120
Activation C19 C19 identity
Weights binary ±1 fp16 + Huffman pack int7 native
Deploy size 4.0 KB · int8 LUT 3,440 B 3,421 B
Decode step none (LUT) canonical Huffman walk none (native)
Lossless 256 / 256 65,536 / 65,536 65,536 / 65,536

Both L1 artifacts plug into the same pipeline slot. Downstream Tokenizer / Embedder / Brain do not know which one fed them.

Variant · Bit budget

3,421 B native footprint

3,421 B
vs 3,440 B Huffman-packed champion · −19 B · −0.55%
W (weight matrix)
3,840 cells × 7 bit
3,360 B
b1 + b2 (biases)
152 cells × 3 bit
57 B
α (global scale)
1 × fp32
4 B
Total
3,421 B
3.34 KB
Native footprint
None
Decode step needed
−19 B
vs Huffman champion
seeds 7, 42
Replicated lossless
2,422 B
Shannon floor
~41%
Gap to floor

The champion path (Block B proper) reaches 3,440 B by Huffman-coding an fp16 mirror model; the reader has to walk a canonical Huffman table to reconstruct weights before inference. This variant skips the decode entirely — the bytes on disk are the quantized weights. Smaller and simpler, at the cost of a different architecture (identity vs C19) and a larger hidden width (H=120 vs H=81). Both sit ~41% above the theoretical 2,422 B Shannon floor — the gap that arithmetic / range coding on the int7 stream could close next.

Block B champion (Huffman)

  • 3,440 B packed · needs decode
  • H = 81 · cells = 2,592
  • fp16 W · C19 activation
  • canonical Huffman tree on disk

Native 7-bit variant

  • 3,421 B raw · zero decode
  • H = 120 · cells = 3,840
  • int7 W · identity (no act.)
  • bytes on disk are weights

Same lossless guarantee · same deploy slot · opposite compression strategy.

Variant · Experimental findings

Why this shape, not the champion's

The autonomous compression loop swept activation × codebook × hidden-width × single-W/dual-W over ~34 iterations. GPT ran three parallel probes on top: alpha ablation, post-hoc bias quantization ladder, and C19 aux-parameter quantization. The three findings below explain the exact shape of this variant.

1 Biases quantize far tighter than expected 3-bit exact
Post-hoc bias probe on the native 7-bit identity H=120 model:
  • seed 42: b1 + b2 still exact at 3-bit; 2-bit already produces 1 bad pair.
  • seed 7: b1 + b2 still exact all the way down to 1-bit.
The multi-seed-safe floor is 3-bit per bias — which is what the native budget above uses.
2 The global α scalar is not removable α-free fails
Alpha-ablation, per seed:
  • seed 42: 0.085% correct, 65,480 bad pairs.
  • seed 7: 0.069% correct, 65,491 bad pairs.
Post-hoc probe confirms even fp16 α already breaks the model (314–561 bad pairs). STE/QAT on its own does not absorb the scale — the one fp32 word stays. 4 bytes well spent.
3 C19 aux params refuse post-hoc quantization keep identity
Same probe on the float exact C19 merger (the champion architecture, before quantization):
  • b1 at int8 → 12 bad, b2 at int8 → 41 bad.
  • c at int8 → 3 bad, rho at int4/5/6/7/8 all → 1 bad.
  • all_aux bundled at int8 → 13 bad.
So there is no regime where you "just quantize the C19 meta and stay exact." That was the gate that decided this variant: if the aux won't quantize, drop the activation entirely and widen the hidden until the identity map fits.

Per-seed detail

metric
seed 7
seed 42
lossless @ 7-bit W
100% · 65,536 / 65,536
100% · 65,536 / 65,536
b1 + b2 floor
1-bit exact
3-bit exact · 2-bit → 1 bad
α-free rollback
0.069% · 65,491 bad
0.085% · 65,480 bad
α-fp16 post-hoc
561 bad
314 bad

Seed 7 is strictly the easier seed on the bias floor — it tolerates 1-bit biases. The 3-bit budget above is the multi-seed-safe floor; seed 42 is the tighter constraint.

Variant · Why 7-bit

Codebook expressivity ladder

Before touching the optimizer, the bake probe measures the representation ceiling of each codebook: train the merger in float to 100%, snap the weights to the target codebook at every α in a grid, take the best snap. No polish, no QAT — just "can this codebook hold the solution at all?" Below 4 bit / weight the answer is no, and QAT cannot rescue what the codebook does not contain.

binary (1b)
0.25% ternary (2b)
1.82% 3-bit int
17.47% 4-bit int
29.28% 5-bit int
50.19% 6-bit int
74.08% 7-bit int
89.17%

Single-W H=81 bake probe, no polish. The last 10.83pp from 7-bit to 100% is what QAT + LBFGS close.

Not representable

Below 4 bit/weight the codebook cannot hold the merger solution. Bake ceiling ≤ 29% regardless of seed, architecture, or optimizer. A pure representation-space limit.

First viable width

7-bit is where the bake clears 89% without polish — enough slack for QAT to close to 100%. Smaller widths leave too much representational debt.

Variant · Artifacts

Deploy & reproduce

Deploy binary · pending native packer port
identity H=120 · W int7 · b1/b2 int3 · α fp32
3,421 B target · 100% lossless · 65,536 / 65,536
Current shipped champion (Block B main path)
3,440 B Huffman-packed · 100% lossless · canonical deploy

Reproduce — four commands

Training recipe

Phase 1
Adam
float warmup
Phase 2
LBFGS
float finish, strong-wolfe
Phase 3
α-search
static scan, frozen W
Phase 4
QAT Adam
STE 7-bit W
Phase 5
QAT LBFGS
STE close to 100%

Purple = float regime, cyan = scan/pin, amber = quantized regime. Each phase carries the previous phase's weights forward; no restarts.

Quantizer
STE (straight-through)
Seeds replicated
7, 42
Cluster
Cluster 18

Speculative decode cost

Champion · Huffman

canonical Huffman walk → fp16 bit-unpack → fp16 → fp32 cast → MAC

~O(N) table-driven decode cycles per weight load, then standard MAC

Variant · Native int7

read byte → int7 sign-extend → MAC against int32 accumulator → α·fp32 at the end

zero decode cycles; hot-path is pure load-and-MAC

Hypothesis only — no benchmark yet. The advantage compounds when weights churn out of cache; for hot weights kept in L1, the MAC dominates and the gap collapses. One of the first things to measure after the native packer port lands.

Variant · Roadmap

What this unlocks

Port the native packer into both deploy SDKs (Python/block_b_merger/ and Rust/src/block_b_merger/). The read path becomes a pure int7 multiply-accumulate plus one fp32 scale — no Huffman table, no canonical tree walk.
Benchmark CPU decode + forward cost against the Huffman champion. The native path skips the decode entirely, so it should win on latency; the comparison tells us whether the identity / wider-hidden forward itself is competitive.
Try range / arithmetic coding over the native int7 W stream. If the 7-bit weights are entropy-redundant at all, that lifts the variant below the Huffman champion on size and keeps the no-activation forward path.
Cross-seed robustness: replicate on seeds 1000, 2024, 31337 (the Cluster 11/12 replication set). Two-seed convergence is suggestive but not decisive; a five-seed lossless carpet before promotion.
Close the loop: bubble one verdict back to the Block B canonical page — either promote the native 7-bit to champion, or record why the Huffman mirror stayed.

Related clusters