36 - Design Twitter
Series: System Design & Distributed Systems Previous: 35 - Design Uber or Lyft | Next: 37 - Design Dropbox or Google Drive
1. Requirements
Functional Requirements
| Feature | Details |
|---|---|
| Post tweet | Text (280 chars), images, video |
| Home timeline | Aggregated tweets from followed users |
| User timeline | All tweets by a specific user |
| Follow/unfollow | Asymmetric social graph |
| Search | Full-text search across tweets |
| Trending topics | Real-time trending hashtags/topics |
| Like/retweet | Engagement actions |
| Notifications | Mentions, likes, follows, retweets |
Non-Functional Requirements
- Scale: 500M tweets/day, 400M DAU
- Read-heavy: 100:1 read-to-write ratio
- Latency: Timeline loads in < 500ms
- Availability: 99.99%
- Freshness: Tweets appear in followers' timelines within seconds
2. Capacity Estimation
DAU: 400M
Tweets/day: 500M
Avg tweet size: 300 bytes (text + metadata)
Tweet storage/day: 500M x 300B = 150GB
Timeline reads/day: 400M users x 5 reads = 2B reads/day
Timeline read QPS: ~23K QPS (avg), ~50K peak
Tweet write QPS: ~5.8K (avg), ~12K peak
Avg followers: 200
Fan-out volume: 500M tweets x 200 = 100B fan-out ops/day
Media tweets: ~20% with images = 100M/day x 1MB = 100TB/day
| Resource | Estimate |
|---|---|
| Tweet write QPS | ~6K |
| Timeline read QPS | ~23K |
| Fan-out ops/day | ~100B |
| Storage/year (text) | ~55TB |
3. High-Level Architecture
+--------+ +-----------+
| Client |-------->| API |
| (App) | | Gateway |
+--------+ +-----+-----+
|
+---------------+---------------+
| | |
+-----+-----+ +-----+-----+ +------+------+
| Tweet | | Timeline | | User |
| Service | | Service | | Service |
+-----+-----+ +-----+-----+ +------+------+
| | |
+-----+-----+ +-----+-----+ +------+------+
| Tweet DB | | Timeline | | Social |
| (MySQL | | Cache | | Graph |
| sharded) | | (Redis) | | (MySQL/ |
+-----------+ +-----------+ | Graph DB) |
+-------------+
|
+-----+------+
| Fan-out |
| Service |
| (Workers) |
+-----+------+
|
+-----+------+
| Kafka |
+------------+
4. Post Tweet Flow
Client Tweet Service Fan-out Service
| | |
|-- POST /tweet -------->| |
| { text, media_ids } | |
| |-- Validate ----------->|
| |-- Store in Tweet DB -->|
| |-- Enqueue to Kafka --->|
| | |
|<-- 201 Created --------| |
| { tweet_id } | |
| | +------+------+
| | | Read poster |
| | | follower |
| | | list |
| | +------+------+
| | |
| | +------+------+
| | | For each |
| | | follower: |
| | | LPUSH to |
| | | timeline |
| | | cache |
| | +-------------+
Tweet Storage Schema
sqltweets: tweet_id BIGINT PRIMARY KEY (Snowflake ID) user_id BIGINT content VARCHAR(280) media_urls JSON reply_to BIGINT NULL retweet_of BIGINT NULL created_at TIMESTAMP like_count INT DEFAULT 0 retweet_count INT DEFAULT 0 reply_count INT DEFAULT 0 INDEX idx_user_time ON tweets(user_id, created_at DESC)
5. Home Timeline: The Core Challenge
Fan-Out on Write (Push) -- Default for Normal Users
User A (200 followers) posts tweet T1
|
v
Fan-out workers push T1 to 200 followers' timeline caches:
timeline:follower_1 = [T1, T5, T3, T8, ...] (most recent first)
timeline:follower_2 = [T1, T7, T2, T9, ...]
...
timeline:follower_200 = [T1, ...]
Reading timeline: Just fetch from cache -- O(1) lookup.
GET /timeline --> Redis LRANGE timeline:{user_id} 0 49 --> 50 tweet IDs
|
v
Hydrate: fetch full tweet objects by ID (multi-get)
|
v
Return rendered timeline
Fan-Out on Read (Pull) -- For Celebrities
User opens timeline:
1. Fetch pre-computed timeline (from push model) = base feed
2. Fetch recent tweets from followed celebrities
3. Merge and sort by timestamp
4. Return top N
Hybrid Model (Twitter's Actual Approach)
Is the poster a celebrity?
(> 100K followers)
|
+------+------+
| |
YES NO
| |
Skip fan-out Fan-out to all
(too expensive) followers' caches
| |
v v
Pulled at Pre-cached in
read time timeline cache
Timeline Read:
1. Get cached timeline (pushed tweets) --> [T1, T3, T5]
2. Get followed celebrities' recent tweets (pull) --> [T2, T4]
3. Merge by timestamp --> [T1, T2, T3, T4, T5]
4. Return top 50
Interview Tip: This hybrid approach is THE key insight for Twitter design. Demonstrate you understand why pure push fails for celebrities (Lady Gaga with 80M followers = 80M writes per tweet).
6. User Timeline (Simple)
GET /users/{user_id}/tweets
Query: SELECT * FROM tweets
WHERE user_id = {user_id}
ORDER BY created_at DESC
LIMIT 50 OFFSET 0
Optimization:
- Index on (user_id, created_at DESC)
- Cache recent tweets per user in Redis
- No fan-out needed -- just a filtered query
7. Search: Inverted Index on Tweets
Architecture
Tweet Created --> Kafka --> Search Indexer --> Elasticsearch
|
User Search Query --> Search Service --> ES Query --+
|
Ranked Results
Real-Time Search Index
Tweet: "Just landed in Paris! #travel #europe"
Tokens: ["just", "landed", "paris", "travel", "europe"]
Inverted Index Update:
"paris" -> [..., tweet_98765]
"travel" -> [..., tweet_98765]
"europe" -> [..., tweet_98765]
Search Ranking
| Signal | Weight |
|---|---|
| Text relevance (BM25) | High |
| Recency | High (tweets decay fast) |
| Engagement (likes, retweets) | Medium |
| Author verified/authority | Medium |
| User personalization | Low |
Early Termination
- For real-time search, only index last 7 days of tweets
- Archive older tweets to cold index (searched separately)
- Twitter indexes ~500M tweets/day with sub-second latency
8. Trending Topics
Architecture
Tweet Stream (Kafka)
|
v
+------+----------+
| Stream Processor |
| (Flink / Storm) |
+------+----------+
|
1. Extract hashtags and entities
2. Count occurrences in sliding windows
3. Detect anomalies (sudden spikes)
|
v
+------+----------+
| Trending Topics |
| Cache (Redis) |
+------+----------+
|
v
Served to clients
Count-Min Sketch for Efficient Counting
Problem: Counting exact occurrences of millions of hashtags in real-time
requires too much memory.
Count-Min Sketch:
- Probabilistic data structure
- Uses multiple hash functions and a 2D array
- Space: O(1/epsilon * log(1/delta))
- Overestimates counts slightly, never underestimates
h1 h2 h3 (hash functions)
| | |
+--+---+---+--+
| 5 | 0 | 3 | 1 | row 1 (h1)
+--+---+---+--+
| 2 | 7 | 0 | 4 | row 2 (h2)
+--+---+---+--+
| 1 | 3 | 8 | 0 | row 3 (h3)
+--+---+---+--+
count("#travel") = min(row1[h1(travel)], row2[h2(travel)], row3[h3(travel)])
Sliding Window
Window: 1 hour, slide every 5 minutes
|------ window 1 ------|
|------ window 2 ------|
|------ window 3 ------|
For each window:
Count hashtag occurrences
Compare to historical baseline
If count >> baseline: trending
Trending Score
trending_score = (current_count - baseline_count) / baseline_count
Example:
#WorldCup: current_count = 500K/hr, baseline = 10K/hr
score = (500K - 10K) / 10K = 49.0 --> TRENDING
#goodmorning: current_count = 50K/hr, baseline = 45K/hr
score = (50K - 45K) / 45K = 0.11 --> NOT trending (normal volume)
Interview Tip: The key insight is that trending is about velocity of change, not absolute volume. A hashtag that suddenly spikes is trending even if its total count is modest.
9. Follow/Unfollow Mechanics
Social Graph Storage
sqlfollows: follower_id BIGINT followee_id BIGINT created_at TIMESTAMP PRIMARY KEY (follower_id, followee_id) INDEX idx_followee ON follows(followee_id)
Operations
| Operation | Query |
|---|---|
| Follow | INSERT INTO follows (follower_id, followee_id) |
| Unfollow | DELETE FROM follows WHERE follower_id=X AND followee_id=Y |
| Get following | SELECT followee_id FROM follows WHERE follower_id=X |
| Get followers | SELECT follower_id FROM follows WHERE followee_id=X |
| Follower count | SELECT COUNT(*) ... (or denormalized counter) |
Unfollow and Timeline Cache
When user A unfollows user B:
- Delete from follows table
- Async: remove B's tweets from A's timeline cache
- Or: lazy filter -- skip B's tweets when rendering A's timeline
10. Like/Retweet Counters
Challenge
Popular tweets get millions of likes. UPDATE likes SET count = count + 1 creates hot row contention.
Solution: Sharded Counters
tweet_12345_likes:
shard_0: 15,234
shard_1: 14,892
shard_2: 15,108
shard_3: 14,766
total_likes = sum(all shards) = 60,000
Write: randomly pick a shard, increment
Read: sum all shards (cached with 30s TTL)
Like/Unlike with Dedup
sqllikes: user_id BIGINT tweet_id BIGINT created_at TIMESTAMP PRIMARY KEY (user_id, tweet_id) -- Like: INSERT (idempotent with PK constraint) -- Unlike: DELETE -- Check if liked: SELECT EXISTS WHERE user_id=X AND tweet_id=Y
11. Media Attachments
Tweet with Image:
Client --> Upload Service --> S3
|
v
Generate:
- Thumbnail (150x150)
- Small (680px wide)
- Large (1200px wide)
|
v
Return media_id
Client --> POST /tweet { text: "...", media_ids: ["m_123"] }
|
v
Store tweet with media_urls
Media Storage
| Type | Max Size | Processing |
|---|---|---|
| Image | 5MB | Resize, compress, WebP/AVIF |
| GIF | 15MB | Convert to MP4 for efficiency |
| Video | 512MB | Transcode to HLS segments |
12. Notification System
Events (Kafka):
- mention: "@user was mentioned in a tweet"
- like: "user liked your tweet"
- retweet: "user retweeted your tweet"
- follow: "user followed you"
- reply: "user replied to your tweet"
|
v
+------+----------+
| Notification |
| Service |
+------+----------+
|
+----+----+----+
| | |
v v v
In-App Push Email
(badge, (APNs, (digest)
bell) FCM)
Notification Aggregation
Instead of:
"Alice liked your tweet"
"Bob liked your tweet"
"Charlie liked your tweet"
Show:
"Alice, Bob, and 47 others liked your tweet"
- Aggregate within time windows (5 minutes)
- Group by tweet + action type
- Show count + sample names
13. Analytics
Real-Time Metrics
Tweet Impressions Pipeline:
Client view event --> Kafka --> Flink --> Metrics Store (Druid/ClickHouse)
|
Tweet analytics dashboard:
- Impressions
- Engagements
- Link clicks
- Profile visits
Batch Analytics
- Daily aggregation via Spark jobs
- User growth, engagement trends
- Content moderation signals (spam detection)
14. Complete System Diagram
+---------+ +---------+
| Client | | Client |
| (Read) | | (Write) |
+----+----+ +----+----+
| |
v v
+----+-----------+ +------+------+
| CDN | | API Gateway |
| (media, static)| +------+------+
+----------------+ |
+--------------+---------------+
| | |
+-----+----+ +------+-----+ +-----+-----+
| Tweet | | Timeline | | Search |
| Service | | Service | | Service |
+-----+----+ +------+-----+ +-----+-----+
| | |
+-----+----+ +------+-----+ +-----+-----+
| Tweet DB | | Timeline | | Elastic- |
| (MySQL | | Cache | | search |
| sharded)| | (Redis) | +-----------+
+----------+ +------+-----+
| ^
+-----+----+ |
| Kafka +---------+
+--+--+--+-+
| | |
+----------+ | +----------+
| | |
+------+---+ +-----+----+ +-----+------+
| Fan-out | | Search | | Trending |
| Workers | | Indexer | | Service |
+----------+ +----------+ | (Flink) |
+-----+------+
|
+-----+------+
| Trending |
| Cache |
+------------+
+------------------+ +------------------+ +------------------+
| Social Graph DB | | Notification Svc | | Analytics |
| (follows) | | (Kafka + Push) | | (Spark + Druid) |
+------------------+ +------------------+ +------------------+
15. Interview Tips
- Hybrid fan-out is THE answer: Pure push fails for celebrities, pure pull is too slow
- Timeline cache in Redis: Pre-computed for fast reads, key data structure
- Trending = velocity, not volume: Count-min sketch + sliding window
- Search is real-time: Tweets indexed within seconds, last 7 days hot index
- Snowflake IDs: Mention Twitter's ID generation (time-ordered, distributed)
- Sharded counters: For like/retweet counts on viral tweets
- Read-heavy optimization: Cache aggressively, timeline is the hot path
- Don't conflate home timeline and user timeline: They have completely different data paths
16. Resources
- Alex Xu - System Design Interview Vol. 1, Chapter 11 (also applicable)
- Twitter Engineering Blog: "The Infrastructure Behind Twitter: Scale"
- "How Twitter Uses Redis to Scale" (RedisConf talk)
- Twitter Snowflake: Distributed ID generation
- "Timelines at Scale" (QCon talk by Raffi Krikorian)
- Count-Min Sketch paper (Cormode & Muthukrishnan, 2005)
- Martin Kleppmann - Designing Data-Intensive Applications, Chapter 11 (Stream Processing)
Previous: 35 - Design Uber or Lyft | Next: 37 - Design Dropbox or Google Drive