D

Block D · Embedder

Word Embedder V1

A 32,294 × 64 lookup table that converts token IDs into dense vectors. Xavier-initialized and int8-quantized, awaiting end-to-end training.

SCAFFOLD · v5.0.0-β.2

↓

Block D · Architecture

Lookup table bridge

token ID
0 – 32,293

→

32,294 × 64
embedding table

→

64-dim
float vector

One row per vocab slot. At inference: O(1) index into the table. The upper model (Block E) trains this entire table end-to-end.

⚠

Scaffold status: weights are Xavier-random (seed=42). Vectors carry no semantic meaning yet. Block E must be connected and trained on real data before these embeddings become useful.

Table shape

32,294 × 64

Init

Xavier uniform

Seed

42

Params

2,066,816

f32 size

8.27 MB

int8 size

2.07 MB

Block D · Init baseline

Scaffold measurements

2.07 MB

int8 on-disk size

~0.0012

Mean dequant error

2,066,816

Total parameters

8.27 MB

f32 full precision

ℹ

The int8 quantization is already validated: dequantization error of ~0.0012 mean absolute deviation is negligible for training. The table is ready to plug into Block E the moment end-to-end training begins.

94.7% of Block E's total parameter count lives in this embedder. Training signals from the language model head will flow back through Block E's transformer layers and into this table, gradually encoding semantic structure.

Block D · Artifacts

Deploy & reproduce

⚠

Scaffold artifact. The file exists and is valid, but the weights are random init. Deploy only as part of the full training pipeline, not as a standalone production artifact.

Reproduce in one command

python tools/diag_word_embedding_v1.py

f32 size

8.27 MB

int8 size

2.07 MB

Dequant error

~0.0012

Version

v5.0.0-β.2

Block D · Roadmap

What comes next

→

Connect to Block E (Nano Brain) end-to-end training loop. The embedder and Block E share tied weights on the output head — both are trained jointly.

→

After initial training pass: run nearest-neighbour probes on the embedding space to confirm that semantically related tokens cluster as expected.

→

Post-training: re-quantize to int8 and measure actual (not scaffold) dequant error. Target < 0.001.

Related PRs

#131

← C Tokenizer All Blocks E Brain →