39 - Design Typeahead & Autocomplete
Previous: 38 - Design Search Engine | Next: 40 - Design Distributed Cache
1. Problem Statement
Design a typeahead/autocomplete system that returns real-time search suggestions as the user types. Every character keystroke should surface a ranked list of completions within ~100ms. Think Google Search Bar, Amazon product search, or Slack's channel/user finder.
2. Requirements
Functional
| Requirement | Detail |
|---|---|
| Prefix matching | Return top-K suggestions matching the typed prefix |
| Ranked results | Suggestions ordered by popularity, recency, or personalization |
| Low latency | Results within 100ms of each keystroke |
| Multi-language | Support Unicode, CJK characters, diacritics |
| Freshness | Trending queries surface within minutes to hours |
Non-Functional
| Requirement | Target |
|---|---|
| Availability | 99.99% |
| Latency (p99) | < 100ms |
| Scale | 5B+ queries/day (Google-scale), top-K = 5-10 |
| Read:Write ratio | ~1000:1 (reads dominate) |
3. Core Data Structure: Trie (Prefix Tree)
A trie stores every character of a query on an edge. Each node can hold aggregated metadata (frequency, rank).
(root)
/ | \
t b a
/ | \
r e p
/ \ | |
e i s(7) p(3)
| | | |
e p(5) t(12) l(9)
| |
(4) e(22)
Numbers in parentheses = query frequency.
Trie Node Structure (Conceptual)
TrieNode:
children: Map<char, TrieNode> # branching edges
isEnd: bool # marks a complete query
topK: List<(query, score)> # precomputed top-K for this prefix
Key insight: Precomputing topK at every node turns a DFS traversal into an O(1) lookup at query time.
4. Trie Optimizations
| Optimization | How | Benefit |
|---|---|---|
| Compressed trie (radix tree) | Merge single-child chains into one edge ("tre" instead of t->r->e) | 50-70% node reduction |
| Limit depth | Cap trie at depth 15-20 characters | Bound memory, queries rarely exceed this |
| Prune low-frequency | Remove nodes with frequency < threshold | Reduce memory footprint |
| Precompute top-K per node | Store sorted suggestions at each node | O(1) read at query time |
| Two-level trie | First 2 chars as hash key -> sub-trie | Parallelism, smaller individual tries |
5. Ranking Suggestions
Signals for Ranking
Score = w1 * global_frequency
+ w2 * recent_frequency (decay over time)
+ w3 * user_personal_score
+ w4 * location_relevance
+ w5 * language_match
| Signal | Source | Freshness |
|---|---|---|
| Global frequency | Aggregated query logs | Hourly/daily rebuild |
| Trending boost | Recent query spike detection | Minutes (streaming) |
| Personalization | User's past queries, clicks | Per-session or daily |
| Location | GeoIP | Request-time |
| Language | Accept-Language header, user settings | Request-time |
Interview Tip
Interviewers love hearing about the cold start problem: new users have no personalization signal. Fall back to global popularity + location.
6. Data Collection Pipeline
User Query Logs
|
v
+-------------+ +------------------+ +----------------+
| Query Logger | ----> | Stream Processor | ----> | Aggregator |
| (Kafka) | | (Flink/Spark) | | (freq counts) |
+-------------+ +------------------+ +----------------+
|
v
+----------------+
| Trie Builder |
| (offline job) |
+----------------+
|
v
+----------------+
| Trie Snapshot |
| (serialized) |
+----------------+
|
deploy to serving nodes
Steps
- Log collection -- Every search query logged to Kafka with timestamp, user_id, geo
- Stream aggregation -- Flink/Spark Streaming counts query frequency in sliding windows (1h, 24h, 7d)
- Offline trie build -- Periodic MapReduce job builds full trie from aggregated counts
- Snapshot distribution -- Serialized trie pushed to serving nodes via blob storage (S3)
7. Caching Strategy
Caching is critical because prefix distributions follow a power law: a small set of prefixes accounts for most lookups.
Client
|
v
+------------------+
| Browser Cache | <-- Cache "app" -> [...] for 5 min
| (localStorage) |
+------------------+
|
v
+------------------+
| CDN / Edge Cache | <-- Cache top prefixes globally
+------------------+
|
v
+------------------+
| Application Cache| <-- Redis/Memcached: prefix -> top-K
| (prefix -> topK) |
+------------------+
|
v
+------------------+
| Trie Servers | <-- In-memory trie lookup
+------------------+
| Cache Layer | TTL | Hit Rate |
|---|---|---|
| Browser/client | 5-15 min | 30-40% |
| CDN edge | 1-5 min | 20-30% |
| Application (Redis) | 10-60 min | 25-35% |
| Combined | -- | ~80% |
Interview Tip
Mention that you can cache all 1-2 character prefixes aggressively (only 26^2 = 676 combinations for English) since they're queried most.
8. Full System Architecture
+-------------------+
| Mobile / Web |
| Client |
+--------+----------+
|
debounce 50-150ms
|
+--------v----------+
| CDN / Edge Cache |
+--------+----------+
| cache miss
+--------v----------+
| API Gateway / |
| Load Balancer |
+--------+----------+
|
+------------------+------------------+
| |
+--------v----------+ +-----------v--------+
| Typeahead Service | | Typeahead Service |
| (Stateless) | | (Stateless) |
+--------+----------+ +--------------------+
|
+--------v----------+
| Redis Cache Layer | prefix -> top-K (serialized)
+--------+----------+
| cache miss
+--------v----------+
| Trie Server | in-memory trie, sharded by prefix range
| (Partition: a-m) |
+--------------------+
| Trie Server |
| (Partition: n-z) |
+--------------------+
---- Offline Pipeline ----
+--------------------+ +--------------------+
| Kafka Query Logs | --> | Flink Aggregator |
+--------------------+ +--------+-----------+
|
+--------v-----------+
| Trie Builder (MR) |
+--------+-----------+
|
+--------v-----------+
| Blob Store (S3) |
| trie snapshots |
+--------------------+
9. Distributed Trie (Partitioning)
A single trie won't fit in one machine's memory at Google scale. Partition by prefix range.
| Strategy | How | Pros | Cons |
|---|---|---|---|
| Range-based | Shard by first 1-2 chars (a-f, g-m, ...) | Simple, predictable | Uneven if some prefixes are hot |
| Hash-based | Hash the prefix, mod N | Even distribution | Loses prefix locality |
| Hybrid | Range for first char, hash within | Balanced | More complex routing |
Handling Hot Prefixes
- Replicate hot shards (e.g., "how", "what", "why") to multiple nodes
- Use a scatter-gather pattern: query all replicas, merge results
10. Update Strategy
| Approach | Mechanism | Latency | Complexity |
|---|---|---|---|
| Offline rebuild | Full trie rebuilt every 1-6 hours from aggregated data | Hours | Low |
| Online update | Streaming updates modify trie in place | Minutes | High (concurrency) |
| Hybrid | Offline rebuild + real-time overlay trie for trending queries | Minutes | Medium |
Hybrid Approach (Recommended for Interviews)
Query arrives --> check real-time overlay trie first
--> merge results with base trie
--> return combined top-K
Every 6 hours: rebuild base trie, flush overlay
11. Client-Side Optimizations
| Technique | Description |
|---|---|
| Debouncing | Wait 50-150ms after last keystroke before sending request |
| Request cancellation | Cancel in-flight request when new character typed |
| Prefetching | On focus, prefetch top suggestions for empty prefix |
| Local caching | Cache prefix -> results in memory/localStorage |
| Optimistic rendering | Show cached results immediately, update when fresh data arrives |
12. Multi-Language Support
| Challenge | Solution |
|---|---|
| Unicode characters | Use Unicode-aware trie with multi-byte edge labels |
| CJK (no spaces) | Tokenize with n-grams or dictionary segmentation |
| Diacritics | Normalize to base form (e -> e, a -> a) for matching, display original |
| RTL languages | Logical prefix matching (not visual), UI handles display direction |
| Transliteration | "tokyo" matches both "tokyo" and Japanese characters |
13. Key Trade-offs Discussion
| Decision | Option A | Option B |
|---|---|---|
| Storage | In-memory trie (fast, expensive) | SSD-backed trie (slower, cheaper) |
| Update freshness | Real-time (complex) | Hourly batch (simple, slight staleness) |
| Ranking | Global popularity only (simple) | Personalized (complex, better UX) |
| Partitioning | By prefix range (locality) | By hash (even distribution) |
| Caching | Aggressive multi-layer (high hit rate) | Minimal (simpler, more trie load) |
14. Capacity Estimation
Assumptions:
- 5B queries/day = ~58K QPS
- Average query length: 4 words, 20 chars
- Average 4 keystrokes trigger suggestions (with debounce)
- Suggestion QPS: 58K * 4 = ~232K QPS
Storage:
- 100M unique queries, avg 20 chars each
- Raw: 100M * 20B = 2GB (just strings)
- Trie with metadata: ~10-20GB per shard
- With top-K precomputed: ~30-50GB total across shards
15. Interview Checklist
- Clarified: real-time vs batch, personalized vs global, scale
- Explained trie data structure and why it fits prefix search
- Discussed trie optimizations (compressed, pruned, precomputed top-K)
- Covered ranking signals and how they combine
- Designed the offline data pipeline (logs -> aggregation -> trie build)
- Multi-layer caching strategy with hit rate estimates
- Distributed trie partitioning and hot prefix handling
- Client-side optimizations (debounce, cancellation, local cache)
- Update strategy (offline + real-time overlay hybrid)
- Mentioned multi-language considerations
16. Resources
- Grokking the System Design Interview -- Typeahead Suggestion chapter
- System Design Interview (Alex Xu, Vol 1) -- Chapter 13: Design Autocomplete
- Google Research Blog -- How Google Autocomplete Works
- Paper: "Efficient Auto-Completion via Trie-Based Data Structures" (ACM)
- YouTube: Gaurav Sen -- Typeahead System Design
- YouTube: Tech Dummies -- Design Autocomplete
Previous: 38 - Design Search Engine | Next: 40 - Design Distributed Cache