Tokens & Tokenization

What Is a Token?

A token is the smallest unit an LLM reads and writes. Not a character. Not a word. Something in between — typically a subword.

"Hello world"        → ["Hello", " world"]           = 2 tokens
"tokenization"       → ["token", "ization"]           = 2 tokens  
"Zineddine"          → ["Z", "ined", "dine"]          = 3 tokens (unusual word)
"I love programming" → ["I", " love", " programming"] = 3 tokens

Rule of thumb for English: 1 token ≈ 4 characters ≈ ¾ of a word


Why Tokens and Not Characters?

ApproachVocabulary SizeSequence LengthEfficiency
Characters256 (ASCII)Very longBad — model must learn that c-a-t means cat
Words500,000+ShortBad — can't handle new/rare words, huge vocabulary
Subwords (tokens)~100,000BalancedBest — captures meaning, handles any text

Subwords are the sweet spot: common words stay whole ("the", "hello"), rare words get split ("tokenization" → "token" + "ization"), and the model can handle ANY text.


How Tokenizers Are Built (BPE)

Byte Pair Encoding — the most common algorithm:

Step 1: Start with individual characters
  "low"  = [l, o, w]
  "lower" = [l, o, w, e, r]
  "newest" = [n, e, w, e, s, t]

Step 2: Count adjacent pairs, merge most frequent
  Most frequent pair: (e, w) → merge into "ew"
  "newest" = [n, ew, e, s, t]

Step 3: Repeat — merge next most frequent pair
  ... keep merging until vocabulary reaches target size (~100K tokens)

Result: common sequences become single tokens, rare ones stay as multiple pieces.


Token Examples Across Types

English Text

"The quick brown fox jumps over the lazy dog"
= ["The", " quick", " brown", " fox", " jumps", " over", " the", " lazy", " dog"]
= 9 tokens

Code

python
"def hello_world():\n print('Hello')" = ["def", " hello", "_", "world", "():", "\n", " ", "print", "('", "Hello", "')"] = 11 tokens (code tends to be more token-dense)

Non-English Text

"Bonjour le monde"  = ~4 tokens (French, similar to English)
"مرحبا بالعالم"     = ~8 tokens (Arabic, less efficient)
"こんにちは世界"     = ~5 tokens (Japanese, each character may be 1-2 tokens)

Important: Non-English text typically uses 1.5-3x more tokens than English for the same meaning. This means: higher cost and faster context window usage.


Why Tokens Matter to YOU

1. Pricing

AI models charge per million tokens (input and output separately):

ModelInput (per 1M tokens)Output (per 1M tokens)
Claude Haiku 4.5$0.80$4.00
Claude Sonnet 4.6$3.00$15.00
Claude Opus 4.6$15.00$75.00
GPT-4o$2.50$10.00
GPT-4o mini$0.15$0.60

Example cost: A 2,000-word document ≈ 2,700 tokens. Sending it to Claude Sonnet = ~$0.008 input. The model's 500-word response ≈ 670 tokens = ~$0.01 output. Total: ~$0.018 per interaction.

2. Context Window Limits

Every model has a maximum tokens it can process:

Context Window = System Prompt + Conversation History + Your Message + Model's Response
                 ↑ all of this counts against the limit

If you paste a 50,000-token codebase, you're using most of a 128K window before you even ask a question.

3. Latency

More tokens = longer response time. Output tokens are the bottleneck — each token is generated sequentially (~30-100 tokens/second depending on model).

Short response (100 tokens): ~1-3 seconds
Long response (2000 tokens): ~20-60 seconds

4. Cost Adds Up Fast

Every message in a conversation resends ALL previous messages:

Message 1: You send 100 tokens → model reads 100 tokens
Message 2: You send 100 tokens → model reads 200 tokens (msg1 + msg2)
Message 3: You send 100 tokens → model reads 300 tokens
...
Message 20: model reads 2000+ tokens EVERY TIME

This is why long conversations get expensive.


How to Count Tokens

Quick Mental Math

  • English: words × 1.3 ≈ tokens
  • Code: characters ÷ 3.5 ≈ tokens
  • 1 page of text ≈ 400-500 tokens

Programmatic

python
# Anthropic import anthropic client = anthropic.Anthropic() result = client.count_tokens("Your text here") # OpenAI import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") tokens = enc.encode("Your text here") print(len(tokens)) # number of tokens

Online Tools


Practical Tips

  1. Be concise in prompts — every word costs tokens
  2. Don't paste entire files if you only need a section
  3. Use shorter model names for high-volume tasks (Haiku vs Opus = 19x cheaper)
  4. Structured output (JSON) uses more tokens than plain text
  5. System prompts persist across every message — keep them focused
  6. Summarize conversation history for long chats to save tokens

Resources


Previous: 01 - What Are LLMs | Next: 03 - Context Window