Block C · Tokenizer
A space-aware hybrid tokenizer — whole-word, subword, and byte fallback — that beats gzip-9 by 7.19 percentage points on real FineWeb-EDU text.
Block C · Architecture
Block C · Results
| Method | Compression ratio | vs VRAXION |
|---|---|---|
| VRAXION C (Huffman) | 30.43% | — |
| bzip2-9 | 29.97% | +0.46 pp worse |
| lzma-9e | 28.61% | +1.82 pp worse |
| gzip-9 | 37.62% | +7.19 pp worse |
Block C · Artifacts
Block C · Roadmap
Related PRs & clusters