← VRAXION
BLOCKS OVERVIEW
GitHub

v5.0.0-β.2 · VRAXION Architecture

Building Blocks

Five pipeline blocks from raw bytes to predicted tokens. Each block has a defined interface and ships as a standalone frozen or scaffold artifact.

A
Byte Unit
8-bit → 16-dim latent
B
Byte-Pair Merger
32-dim → 32-dim merged
C
Word Tokenizer
text → token IDs
D
Word Embedder
token ID → 64-dim vector
E
Nano Brain
sequence → logits

Select a block to explore

A FROZEN
Byte Unit
Binary-weight mirror autoencoder. Maps every raw byte to a 16-dim latent. 100% lossless on all 256 values.
8→16→16 C19 activation binary {-1,+1}
Open Block A →
B FROZEN
Byte-Pair Merger
Single-W mirror-tied autoencoder. Merges two L0 outputs into one 32-dim representation. 100% lossless on all 65,536 byte pairs.
32→32 2,592 cells Huffman 3.4 KB
Open Block B →
C FROZEN
Word Tokenizer V2
Space-aware hybrid tokenizer — whole-word + subword + byte fallback. Beats gzip-9 by 7.19 pp on FineWeb-EDU.
32,294 vocab DP segmentation SuperBPE τ=0.9
Open Block C →
D SCAFFOLD
Word Embedder V1
32,294 × 64 lookup table. Xavier-init, seed 42. Awaits end-to-end training. Int8 quantized at ~0.0012 dequant error.
2.07 MB int8 64-dim Xavier init
Open Block D →
E SCAFFOLD
Nano Brain V1
2-block causal Transformer, 64-dim, 4 heads. Forward-pass verified. 2.18 M params. Awaits training data pipeline.
2× TransformerBlock GELU FFN max_seq=256
Open Block E →