Tokens & Tokenization

What Is a Token?

A token is the smallest unit an LLM reads and writes. Not a character. Not a word. Something in between — typically a subword.

"Hello world"        → ["Hello", " world"]           = 2 tokens
"tokenization"       → ["token", "ization"]           = 2 tokens  
"Zineddine"          → ["Z", "ined", "dine"]          = 3 tokens (unusual word)
"I love programming" → ["I", " love", " programming"] = 3 tokens

Rule of thumb for English: 1 token ≈ 4 characters ≈ ¾ of a word

Why Tokens and Not Characters?

Approach	Vocabulary Size	Sequence Length	Efficiency
Characters	256 (ASCII)	Very long	Bad — model must learn that c-a-t means cat
Words	500,000+	Short	Bad — can't handle new/rare words, huge vocabulary
Subwords (tokens)	~100,000	Balanced	Best — captures meaning, handles any text

Subwords are the sweet spot: common words stay whole ("the", "hello"), rare words get split ("tokenization" → "token" + "ization"), and the model can handle ANY text.

How Tokenizers Are Built (BPE)

Byte Pair Encoding — the most common algorithm:

Step 1: Start with individual characters
  "low"  = [l, o, w]
  "lower" = [l, o, w, e, r]
  "newest" = [n, e, w, e, s, t]

Step 2: Count adjacent pairs, merge most frequent
  Most frequent pair: (e, w) → merge into "ew"
  "newest" = [n, ew, e, s, t]

Step 3: Repeat — merge next most frequent pair
  ... keep merging until vocabulary reaches target size (~100K tokens)

Result: common sequences become single tokens, rare ones stay as multiple pieces.

Token Examples Across Types

English Text

"The quick brown fox jumps over the lazy dog"
= ["The", " quick", " brown", " fox", " jumps", " over", " the", " lazy", " dog"]
= 9 tokens

Code

python
"def hello_world():\n    print('Hello')"
= ["def", " hello", "_", "world", "():", "\n", "    ", "print", "('", "Hello", "')"]
= 11 tokens (code tends to be more token-dense)

Non-English Text

"Bonjour le monde"  = ~4 tokens (French, similar to English)
"مرحبا بالعالم"     = ~8 tokens (Arabic, less efficient)
"こんにちは世界"     = ~5 tokens (Japanese, each character may be 1-2 tokens)

Important: Non-English text typically uses 1.5-3x more tokens than English for the same meaning. This means: higher cost and faster context window usage.

Why Tokens Matter to YOU

1. Pricing

AI models charge per million tokens (input and output separately):

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Haiku 4.5	$0.80	$4.00
Claude Sonnet 4.6	$3.00	$15.00
Claude Opus 4.6	$15.00	$75.00
GPT-4o	$2.50	$10.00
GPT-4o mini	$0.15	$0.60

Example cost: A 2,000-word document ≈ 2,700 tokens. Sending it to Claude Sonnet = ~$0.008 input. The model's 500-word response ≈ 670 tokens = ~$0.01 output. Total: ~$0.018 per interaction.

2. Context Window Limits

Every model has a maximum tokens it can process:

Context Window = System Prompt + Conversation History + Your Message + Model's Response
                 ↑ all of this counts against the limit

If you paste a 50,000-token codebase, you're using most of a 128K window before you even ask a question.

3. Latency

More tokens = longer response time. Output tokens are the bottleneck — each token is generated sequentially (~30-100 tokens/second depending on model).

Short response (100 tokens): ~1-3 seconds
Long response (2000 tokens): ~20-60 seconds

4. Cost Adds Up Fast

Every message in a conversation resends ALL previous messages:

Message 1: You send 100 tokens → model reads 100 tokens
Message 2: You send 100 tokens → model reads 200 tokens (msg1 + msg2)
Message 3: You send 100 tokens → model reads 300 tokens
...
Message 20: model reads 2000+ tokens EVERY TIME

This is why long conversations get expensive.

How to Count Tokens

Quick Mental Math

English: words × 1.3 ≈ tokens
Code: characters ÷ 3.5 ≈ tokens
1 page of text ≈ 400-500 tokens

Programmatic

python
# Anthropic
import anthropic
client = anthropic.Anthropic()
result = client.count_tokens("Your text here")

# OpenAI
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Your text here")
print(len(tokens))  # number of tokens

Online Tools

Practical Tips

Be concise in prompts — every word costs tokens
Don't paste entire files if you only need a section
Use shorter model names for high-volume tasks (Haiku vs Opus = 19x cheaper)
Structured output (JSON) uses more tokens than plain text
System prompts persist across every message — keep them focused
Summarize conversation history for long chats to save tokens

Resources

🔗 Anthropic — Token Counting
🔗 OpenAI — Tokenizer Tool
🎥 Andrej Karpathy — Let's build the GPT Tokenizer
📖 Hugging Face — Byte Pair Encoding

Previous: 01 - What Are LLMs | Next: 03 - Context Window