Tokens & Tokenization
What Is a Token?
A token is the smallest unit an LLM reads and writes. Not a character. Not a word. Something in between — typically a subword.
"Hello world" → ["Hello", " world"] = 2 tokens
"tokenization" → ["token", "ization"] = 2 tokens
"Zineddine" → ["Z", "ined", "dine"] = 3 tokens (unusual word)
"I love programming" → ["I", " love", " programming"] = 3 tokens
Rule of thumb for English: 1 token ≈ 4 characters ≈ ¾ of a word
Why Tokens and Not Characters?
| Approach | Vocabulary Size | Sequence Length | Efficiency |
|---|---|---|---|
| Characters | 256 (ASCII) | Very long | Bad — model must learn that c-a-t means cat |
| Words | 500,000+ | Short | Bad — can't handle new/rare words, huge vocabulary |
| Subwords (tokens) | ~100,000 | Balanced | Best — captures meaning, handles any text |
Subwords are the sweet spot: common words stay whole ("the", "hello"), rare words get split ("tokenization" → "token" + "ization"), and the model can handle ANY text.
How Tokenizers Are Built (BPE)
Byte Pair Encoding — the most common algorithm:
Step 1: Start with individual characters
"low" = [l, o, w]
"lower" = [l, o, w, e, r]
"newest" = [n, e, w, e, s, t]
Step 2: Count adjacent pairs, merge most frequent
Most frequent pair: (e, w) → merge into "ew"
"newest" = [n, ew, e, s, t]
Step 3: Repeat — merge next most frequent pair
... keep merging until vocabulary reaches target size (~100K tokens)
Result: common sequences become single tokens, rare ones stay as multiple pieces.
Token Examples Across Types
English Text
"The quick brown fox jumps over the lazy dog"
= ["The", " quick", " brown", " fox", " jumps", " over", " the", " lazy", " dog"]
= 9 tokens
Code
python"def hello_world():\n print('Hello')" = ["def", " hello", "_", "world", "():", "\n", " ", "print", "('", "Hello", "')"] = 11 tokens (code tends to be more token-dense)
Non-English Text
"Bonjour le monde" = ~4 tokens (French, similar to English)
"مرحبا بالعالم" = ~8 tokens (Arabic, less efficient)
"こんにちは世界" = ~5 tokens (Japanese, each character may be 1-2 tokens)
Important: Non-English text typically uses 1.5-3x more tokens than English for the same meaning. This means: higher cost and faster context window usage.
Why Tokens Matter to YOU
1. Pricing
AI models charge per million tokens (input and output separately):
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Haiku 4.5 | $0.80 | $4.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Opus 4.6 | $15.00 | $75.00 |
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
Example cost: A 2,000-word document ≈ 2,700 tokens. Sending it to Claude Sonnet = ~$0.008 input. The model's 500-word response ≈ 670 tokens = ~$0.01 output. Total: ~$0.018 per interaction.
2. Context Window Limits
Every model has a maximum tokens it can process:
Context Window = System Prompt + Conversation History + Your Message + Model's Response
↑ all of this counts against the limit
If you paste a 50,000-token codebase, you're using most of a 128K window before you even ask a question.
3. Latency
More tokens = longer response time. Output tokens are the bottleneck — each token is generated sequentially (~30-100 tokens/second depending on model).
Short response (100 tokens): ~1-3 seconds
Long response (2000 tokens): ~20-60 seconds
4. Cost Adds Up Fast
Every message in a conversation resends ALL previous messages:
Message 1: You send 100 tokens → model reads 100 tokens
Message 2: You send 100 tokens → model reads 200 tokens (msg1 + msg2)
Message 3: You send 100 tokens → model reads 300 tokens
...
Message 20: model reads 2000+ tokens EVERY TIME
This is why long conversations get expensive.
How to Count Tokens
Quick Mental Math
- English: words × 1.3 ≈ tokens
- Code: characters ÷ 3.5 ≈ tokens
- 1 page of text ≈ 400-500 tokens
Programmatic
python# Anthropic import anthropic client = anthropic.Anthropic() result = client.count_tokens("Your text here") # OpenAI import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") tokens = enc.encode("Your text here") print(len(tokens)) # number of tokens
Online Tools
Practical Tips
- Be concise in prompts — every word costs tokens
- Don't paste entire files if you only need a section
- Use shorter model names for high-volume tasks (Haiku vs Opus = 19x cheaper)
- Structured output (JSON) uses more tokens than plain text
- System prompts persist across every message — keep them focused
- Summarize conversation history for long chats to save tokens
Resources
- 🔗 Anthropic — Token Counting
- 🔗 OpenAI — Tokenizer Tool
- 🎥 Andrej Karpathy — Let's build the GPT Tokenizer
- 📖 Hugging Face — Byte Pair Encoding
Previous: 01 - What Are LLMs | Next: 03 - Context Window