Block E · Language Model
A 2-block causal Transformer with 64-dim embeddings and 4 attention heads. Forward-pass verified at 81 ms on CPU. Awaiting its first training run.
Block E · Architecture
Causal (left-to-right) attention mask. Output head shares weights with the token embedder.
Block E · Scaffold baseline
Block E · Artifacts
Block E · Roadmap
Related PRs