a hand-drawn walkthrough

How LLMs work

You already know Bayes, drift-diffusion, Gaussian mixtures. Transformers aren't magic — they're just another stack of parameters being squeezed by a loss function. This page follows one short sentence, the cat sat on the, all the way through a language model, block by block, so you can see what's actually moving.

↓ press play on each section — nothing auto-starts.