39 - Design Typeahead & Autocomplete

Previous: 38 - Design Search Engine | Next: 40 - Design Distributed Cache

1. Problem Statement

Design a typeahead/autocomplete system that returns real-time search suggestions as the user types. Every character keystroke should surface a ranked list of completions within ~100ms. Think Google Search Bar, Amazon product search, or Slack's channel/user finder.

2. Requirements

Functional

Requirement	Detail
Prefix matching	Return top-K suggestions matching the typed prefix
Ranked results	Suggestions ordered by popularity, recency, or personalization
Low latency	Results within 100ms of each keystroke
Multi-language	Support Unicode, CJK characters, diacritics
Freshness	Trending queries surface within minutes to hours

Non-Functional

Requirement	Target
Availability	99.99%
Latency (p99)	< 100ms
Scale	5B+ queries/day (Google-scale), top-K = 5-10
Read:Write ratio	~1000:1 (reads dominate)

3. Core Data Structure: Trie (Prefix Tree)

A trie stores every character of a query on an edge. Each node can hold aggregated metadata (frequency, rank).

           (root)
          /   |   \
         t    b    a
        /     |     \
       r      e      p
      / \     |      |
     e   i   s(7)   p(3)
     |   |    |      |
     e  p(5)  t(12)  l(9)
     |                |
    (4)               e(22)

Numbers in parentheses = query frequency.

Trie Node Structure (Conceptual)

TrieNode:
  children: Map<char, TrieNode>    # branching edges
  isEnd: bool                       # marks a complete query
  topK: List<(query, score)>        # precomputed top-K for this prefix

Key insight: Precomputing topK at every node turns a DFS traversal into an O(1) lookup at query time.

4. Trie Optimizations

Optimization	How	Benefit
Compressed trie (radix tree)	Merge single-child chains into one edge ("tre" instead of t->r->e)	50-70% node reduction
Limit depth	Cap trie at depth 15-20 characters	Bound memory, queries rarely exceed this
Prune low-frequency	Remove nodes with frequency < threshold	Reduce memory footprint
Precompute top-K per node	Store sorted suggestions at each node	O(1) read at query time
Two-level trie	First 2 chars as hash key -> sub-trie	Parallelism, smaller individual tries

5. Ranking Suggestions

Signals for Ranking

Score = w1 * global_frequency
      + w2 * recent_frequency (decay over time)
      + w3 * user_personal_score
      + w4 * location_relevance
      + w5 * language_match

Signal	Source	Freshness
Global frequency	Aggregated query logs	Hourly/daily rebuild
Trending boost	Recent query spike detection	Minutes (streaming)
Personalization	User's past queries, clicks	Per-session or daily
Location	GeoIP	Request-time
Language	Accept-Language header, user settings	Request-time

Interview Tip

Interviewers love hearing about the cold start problem: new users have no personalization signal. Fall back to global popularity + location.

6. Data Collection Pipeline

User Query Logs
      |
      v
+-------------+       +------------------+       +----------------+
| Query Logger | ----> | Stream Processor | ----> | Aggregator     |
| (Kafka)      |       | (Flink/Spark)    |       | (freq counts)  |
+-------------+       +------------------+       +----------------+
                                                        |
                                                        v
                                                 +----------------+
                                                 | Trie Builder   |
                                                 | (offline job)  |
                                                 +----------------+
                                                        |
                                                        v
                                                 +----------------+
                                                 | Trie Snapshot  |
                                                 | (serialized)   |
                                                 +----------------+
                                                        |
                                              deploy to serving nodes

Steps

Log collection -- Every search query logged to Kafka with timestamp, user_id, geo
Stream aggregation -- Flink/Spark Streaming counts query frequency in sliding windows (1h, 24h, 7d)
Offline trie build -- Periodic MapReduce job builds full trie from aggregated counts
Snapshot distribution -- Serialized trie pushed to serving nodes via blob storage (S3)

7. Caching Strategy

Caching is critical because prefix distributions follow a power law: a small set of prefixes accounts for most lookups.

Client
  |
  v
+------------------+
| Browser Cache    |  <-- Cache "app" -> [...] for 5 min
| (localStorage)   |
+------------------+
  |
  v
+------------------+
| CDN / Edge Cache |  <-- Cache top prefixes globally
+------------------+
  |
  v
+------------------+
| Application Cache|  <-- Redis/Memcached: prefix -> top-K
| (prefix -> topK) |
+------------------+
  |
  v
+------------------+
| Trie Servers     |  <-- In-memory trie lookup
+------------------+

Cache Layer	TTL	Hit Rate
Browser/client	5-15 min	30-40%
CDN edge	1-5 min	20-30%
Application (Redis)	10-60 min	25-35%
Combined	--	~80%

Interview Tip

Mention that you can cache all 1-2 character prefixes aggressively (only 26^2 = 676 combinations for English) since they're queried most.

8. Full System Architecture

                        +-------------------+
                        |   Mobile / Web    |
                        |   Client          |
                        +--------+----------+
                                 |
                          debounce 50-150ms
                                 |
                        +--------v----------+
                        |  CDN / Edge Cache  |
                        +--------+----------+
                                 |  cache miss
                        +--------v----------+
                        |  API Gateway /     |
                        |  Load Balancer     |
                        +--------+----------+
                                 |
              +------------------+------------------+
              |                                     |
     +--------v----------+              +-----------v--------+
     | Typeahead Service  |              | Typeahead Service  |
     | (Stateless)        |              | (Stateless)        |
     +--------+----------+              +--------------------+
              |
     +--------v----------+
     | Redis Cache Layer  |  prefix -> top-K (serialized)
     +--------+----------+
              |  cache miss
     +--------v----------+
     | Trie Server        |  in-memory trie, sharded by prefix range
     | (Partition: a-m)   |
     +--------------------+
     | Trie Server        |
     | (Partition: n-z)   |
     +--------------------+

              ---- Offline Pipeline ----

     +--------------------+     +--------------------+
     | Kafka Query Logs   | --> | Flink Aggregator   |
     +--------------------+     +--------+-----------+
                                         |
                                +--------v-----------+
                                | Trie Builder (MR)  |
                                +--------+-----------+
                                         |
                                +--------v-----------+
                                | Blob Store (S3)    |
                                | trie snapshots     |
                                +--------------------+

9. Distributed Trie (Partitioning)

A single trie won't fit in one machine's memory at Google scale. Partition by prefix range.

Strategy	How	Pros	Cons
Range-based	Shard by first 1-2 chars (a-f, g-m, ...)	Simple, predictable	Uneven if some prefixes are hot
Hash-based	Hash the prefix, mod N	Even distribution	Loses prefix locality
Hybrid	Range for first char, hash within	Balanced	More complex routing

Handling Hot Prefixes

Replicate hot shards (e.g., "how", "what", "why") to multiple nodes
Use a scatter-gather pattern: query all replicas, merge results

10. Update Strategy

Approach	Mechanism	Latency	Complexity
Offline rebuild	Full trie rebuilt every 1-6 hours from aggregated data	Hours	Low
Online update	Streaming updates modify trie in place	Minutes	High (concurrency)
Hybrid	Offline rebuild + real-time overlay trie for trending queries	Minutes	Medium

Hybrid Approach (Recommended for Interviews)

Query arrives --> check real-time overlay trie first
                  --> merge results with base trie
                  --> return combined top-K

Every 6 hours: rebuild base trie, flush overlay

11. Client-Side Optimizations

Technique	Description
Debouncing	Wait 50-150ms after last keystroke before sending request
Request cancellation	Cancel in-flight request when new character typed
Prefetching	On focus, prefetch top suggestions for empty prefix
Local caching	Cache prefix -> results in memory/localStorage
Optimistic rendering	Show cached results immediately, update when fresh data arrives

12. Multi-Language Support

Challenge	Solution
Unicode characters	Use Unicode-aware trie with multi-byte edge labels
CJK (no spaces)	Tokenize with n-grams or dictionary segmentation
Diacritics	Normalize to base form (e -> e, a -> a) for matching, display original
RTL languages	Logical prefix matching (not visual), UI handles display direction
Transliteration	"tokyo" matches both "tokyo" and Japanese characters

13. Key Trade-offs Discussion

Decision	Option A	Option B
Storage	In-memory trie (fast, expensive)	SSD-backed trie (slower, cheaper)
Update freshness	Real-time (complex)	Hourly batch (simple, slight staleness)
Ranking	Global popularity only (simple)	Personalized (complex, better UX)
Partitioning	By prefix range (locality)	By hash (even distribution)
Caching	Aggressive multi-layer (high hit rate)	Minimal (simpler, more trie load)

14. Capacity Estimation

Assumptions:
- 5B queries/day = ~58K QPS
- Average query length: 4 words, 20 chars
- Average 4 keystrokes trigger suggestions (with debounce)
- Suggestion QPS: 58K * 4 = ~232K QPS

Storage:
- 100M unique queries, avg 20 chars each
- Raw: 100M * 20B = 2GB (just strings)
- Trie with metadata: ~10-20GB per shard
- With top-K precomputed: ~30-50GB total across shards

15. Interview Checklist

16. Resources

Grokking the System Design Interview -- Typeahead Suggestion chapter
System Design Interview (Alex Xu, Vol 1) -- Chapter 13: Design Autocomplete
Google Research Blog -- How Google Autocomplete Works
Paper: "Efficient Auto-Completion via Trie-Based Data Structures" (ACM)
YouTube: Gaurav Sen -- Typeahead System Design
YouTube: Tech Dummies -- Design Autocomplete

Previous: 38 - Design Search Engine | Next: 40 - Design Distributed Cache