39 - Design Typeahead & Autocomplete

Previous: 38 - Design Search Engine | Next: 40 - Design Distributed Cache


1. Problem Statement

Design a typeahead/autocomplete system that returns real-time search suggestions as the user types. Every character keystroke should surface a ranked list of completions within ~100ms. Think Google Search Bar, Amazon product search, or Slack's channel/user finder.


2. Requirements

Functional

RequirementDetail
Prefix matchingReturn top-K suggestions matching the typed prefix
Ranked resultsSuggestions ordered by popularity, recency, or personalization
Low latencyResults within 100ms of each keystroke
Multi-languageSupport Unicode, CJK characters, diacritics
FreshnessTrending queries surface within minutes to hours

Non-Functional

RequirementTarget
Availability99.99%
Latency (p99)< 100ms
Scale5B+ queries/day (Google-scale), top-K = 5-10
Read:Write ratio~1000:1 (reads dominate)

3. Core Data Structure: Trie (Prefix Tree)

A trie stores every character of a query on an edge. Each node can hold aggregated metadata (frequency, rank).

           (root)
          /   |   \
         t    b    a
        /     |     \
       r      e      p
      / \     |      |
     e   i   s(7)   p(3)
     |   |    |      |
     e  p(5)  t(12)  l(9)
     |                |
    (4)               e(22)

Numbers in parentheses = query frequency.

Trie Node Structure (Conceptual)

TrieNode:
  children: Map<char, TrieNode>    # branching edges
  isEnd: bool                       # marks a complete query
  topK: List<(query, score)>        # precomputed top-K for this prefix

Key insight: Precomputing topK at every node turns a DFS traversal into an O(1) lookup at query time.


4. Trie Optimizations

OptimizationHowBenefit
Compressed trie (radix tree)Merge single-child chains into one edge ("tre" instead of t->r->e)50-70% node reduction
Limit depthCap trie at depth 15-20 charactersBound memory, queries rarely exceed this
Prune low-frequencyRemove nodes with frequency < thresholdReduce memory footprint
Precompute top-K per nodeStore sorted suggestions at each nodeO(1) read at query time
Two-level trieFirst 2 chars as hash key -> sub-trieParallelism, smaller individual tries

5. Ranking Suggestions

Signals for Ranking

Score = w1 * global_frequency
      + w2 * recent_frequency (decay over time)
      + w3 * user_personal_score
      + w4 * location_relevance
      + w5 * language_match
SignalSourceFreshness
Global frequencyAggregated query logsHourly/daily rebuild
Trending boostRecent query spike detectionMinutes (streaming)
PersonalizationUser's past queries, clicksPer-session or daily
LocationGeoIPRequest-time
LanguageAccept-Language header, user settingsRequest-time

Interview Tip

Interviewers love hearing about the cold start problem: new users have no personalization signal. Fall back to global popularity + location.


6. Data Collection Pipeline

User Query Logs
      |
      v
+-------------+       +------------------+       +----------------+
| Query Logger | ----> | Stream Processor | ----> | Aggregator     |
| (Kafka)      |       | (Flink/Spark)    |       | (freq counts)  |
+-------------+       +------------------+       +----------------+
                                                        |
                                                        v
                                                 +----------------+
                                                 | Trie Builder   |
                                                 | (offline job)  |
                                                 +----------------+
                                                        |
                                                        v
                                                 +----------------+
                                                 | Trie Snapshot  |
                                                 | (serialized)   |
                                                 +----------------+
                                                        |
                                              deploy to serving nodes

Steps

  1. Log collection -- Every search query logged to Kafka with timestamp, user_id, geo
  2. Stream aggregation -- Flink/Spark Streaming counts query frequency in sliding windows (1h, 24h, 7d)
  3. Offline trie build -- Periodic MapReduce job builds full trie from aggregated counts
  4. Snapshot distribution -- Serialized trie pushed to serving nodes via blob storage (S3)

7. Caching Strategy

Caching is critical because prefix distributions follow a power law: a small set of prefixes accounts for most lookups.

Client
  |
  v
+------------------+
| Browser Cache    |  <-- Cache "app" -> [...] for 5 min
| (localStorage)   |
+------------------+
  |
  v
+------------------+
| CDN / Edge Cache |  <-- Cache top prefixes globally
+------------------+
  |
  v
+------------------+
| Application Cache|  <-- Redis/Memcached: prefix -> top-K
| (prefix -> topK) |
+------------------+
  |
  v
+------------------+
| Trie Servers     |  <-- In-memory trie lookup
+------------------+
Cache LayerTTLHit Rate
Browser/client5-15 min30-40%
CDN edge1-5 min20-30%
Application (Redis)10-60 min25-35%
Combined--~80%

Interview Tip

Mention that you can cache all 1-2 character prefixes aggressively (only 26^2 = 676 combinations for English) since they're queried most.


8. Full System Architecture

                        +-------------------+
                        |   Mobile / Web    |
                        |   Client          |
                        +--------+----------+
                                 |
                          debounce 50-150ms
                                 |
                        +--------v----------+
                        |  CDN / Edge Cache  |
                        +--------+----------+
                                 |  cache miss
                        +--------v----------+
                        |  API Gateway /     |
                        |  Load Balancer     |
                        +--------+----------+
                                 |
              +------------------+------------------+
              |                                     |
     +--------v----------+              +-----------v--------+
     | Typeahead Service  |              | Typeahead Service  |
     | (Stateless)        |              | (Stateless)        |
     +--------+----------+              +--------------------+
              |
     +--------v----------+
     | Redis Cache Layer  |  prefix -> top-K (serialized)
     +--------+----------+
              |  cache miss
     +--------v----------+
     | Trie Server        |  in-memory trie, sharded by prefix range
     | (Partition: a-m)   |
     +--------------------+
     | Trie Server        |
     | (Partition: n-z)   |
     +--------------------+

              ---- Offline Pipeline ----

     +--------------------+     +--------------------+
     | Kafka Query Logs   | --> | Flink Aggregator   |
     +--------------------+     +--------+-----------+
                                         |
                                +--------v-----------+
                                | Trie Builder (MR)  |
                                +--------+-----------+
                                         |
                                +--------v-----------+
                                | Blob Store (S3)    |
                                | trie snapshots     |
                                +--------------------+

9. Distributed Trie (Partitioning)

A single trie won't fit in one machine's memory at Google scale. Partition by prefix range.

StrategyHowProsCons
Range-basedShard by first 1-2 chars (a-f, g-m, ...)Simple, predictableUneven if some prefixes are hot
Hash-basedHash the prefix, mod NEven distributionLoses prefix locality
HybridRange for first char, hash withinBalancedMore complex routing

Handling Hot Prefixes

  • Replicate hot shards (e.g., "how", "what", "why") to multiple nodes
  • Use a scatter-gather pattern: query all replicas, merge results

10. Update Strategy

ApproachMechanismLatencyComplexity
Offline rebuildFull trie rebuilt every 1-6 hours from aggregated dataHoursLow
Online updateStreaming updates modify trie in placeMinutesHigh (concurrency)
HybridOffline rebuild + real-time overlay trie for trending queriesMinutesMedium

Hybrid Approach (Recommended for Interviews)

Query arrives --> check real-time overlay trie first
                  --> merge results with base trie
                  --> return combined top-K

Every 6 hours: rebuild base trie, flush overlay

11. Client-Side Optimizations

TechniqueDescription
DebouncingWait 50-150ms after last keystroke before sending request
Request cancellationCancel in-flight request when new character typed
PrefetchingOn focus, prefetch top suggestions for empty prefix
Local cachingCache prefix -> results in memory/localStorage
Optimistic renderingShow cached results immediately, update when fresh data arrives

12. Multi-Language Support

ChallengeSolution
Unicode charactersUse Unicode-aware trie with multi-byte edge labels
CJK (no spaces)Tokenize with n-grams or dictionary segmentation
DiacriticsNormalize to base form (e -> e, a -> a) for matching, display original
RTL languagesLogical prefix matching (not visual), UI handles display direction
Transliteration"tokyo" matches both "tokyo" and Japanese characters

13. Key Trade-offs Discussion

DecisionOption AOption B
StorageIn-memory trie (fast, expensive)SSD-backed trie (slower, cheaper)
Update freshnessReal-time (complex)Hourly batch (simple, slight staleness)
RankingGlobal popularity only (simple)Personalized (complex, better UX)
PartitioningBy prefix range (locality)By hash (even distribution)
CachingAggressive multi-layer (high hit rate)Minimal (simpler, more trie load)

14. Capacity Estimation

Assumptions:
- 5B queries/day = ~58K QPS
- Average query length: 4 words, 20 chars
- Average 4 keystrokes trigger suggestions (with debounce)
- Suggestion QPS: 58K * 4 = ~232K QPS

Storage:
- 100M unique queries, avg 20 chars each
- Raw: 100M * 20B = 2GB (just strings)
- Trie with metadata: ~10-20GB per shard
- With top-K precomputed: ~30-50GB total across shards

15. Interview Checklist

  • Clarified: real-time vs batch, personalized vs global, scale
  • Explained trie data structure and why it fits prefix search
  • Discussed trie optimizations (compressed, pruned, precomputed top-K)
  • Covered ranking signals and how they combine
  • Designed the offline data pipeline (logs -> aggregation -> trie build)
  • Multi-layer caching strategy with hit rate estimates
  • Distributed trie partitioning and hot prefix handling
  • Client-side optimizations (debounce, cancellation, local cache)
  • Update strategy (offline + real-time overlay hybrid)
  • Mentioned multi-language considerations

16. Resources

  • Grokking the System Design Interview -- Typeahead Suggestion chapter
  • System Design Interview (Alex Xu, Vol 1) -- Chapter 13: Design Autocomplete
  • Google Research Blog -- How Google Autocomplete Works
  • Paper: "Efficient Auto-Completion via Trie-Based Data Structures" (ACM)
  • YouTube: Gaurav Sen -- Typeahead System Design
  • YouTube: Tech Dummies -- Design Autocomplete

Previous: 38 - Design Search Engine | Next: 40 - Design Distributed Cache