Interactive explorer for core LLM concepts — tokenization, temperature, sampling strategies.
Tokenizer
Type text below to see how it gets split into tokens. Uses a BPE-like simulation with common English merges.
0
Characters
0
Tokens
0
Words
0.0
Tokens/Word
0.0
Avg Token Len
Token Visualization
Type text above…
Token ID Table (simulated BPE IDs)
#
Token
ID
Bytes
Type
Temperature Explorer
Temperature scales the logits before softmax. Low values make the model more deterministic; high values flatten the distribution.
1.00
Prompt: "The sky is ___" — Next token probability distribution
Top-k / Top-p Sampling
Given the next-token probability distribution, see which words are included at different k and p values. Purple = included, grey = excluded.
4
0.85
Top-k Result
Top-p (Nucleus) Result
Top-k keeps only the k most likely tokens, then renormalizes. Top-p (nucleus) keeps the smallest set of tokens whose cumulative probability ≥ p.
Both strategies prevent the model from sampling very unlikely (weird) tokens.
Temperature Comparison
Same prompt, same model — different temperatures produce dramatically different outputs.
Prompt:"The old house at the end of the street…"
Why does temperature matter?
Creative tasks (stories, poems, brainstorming) benefit from higher temperature (0.8–1.4) because diversity and surprise are desirable.
Factual tasks (code, math, Q&A) benefit from lower temperature (0.0–0.4) because precision and reproducibility matter more than novelty.
At temperature 0, the model always picks the single most likely token — fully deterministic.