36 - Design Twitter

Series: System Design & Distributed Systems Previous: 35 - Design Uber or Lyft | Next: 37 - Design Dropbox or Google Drive

1. Requirements

Functional Requirements

Feature	Details
Post tweet	Text (280 chars), images, video
Home timeline	Aggregated tweets from followed users
User timeline	All tweets by a specific user
Follow/unfollow	Asymmetric social graph
Search	Full-text search across tweets
Trending topics	Real-time trending hashtags/topics
Like/retweet	Engagement actions
Notifications	Mentions, likes, follows, retweets

Non-Functional Requirements

Scale: 500M tweets/day, 400M DAU
Read-heavy: 100:1 read-to-write ratio
Latency: Timeline loads in < 500ms
Availability: 99.99%
Freshness: Tweets appear in followers' timelines within seconds

2. Capacity Estimation

DAU:                    400M
Tweets/day:             500M
Avg tweet size:         300 bytes (text + metadata)
Tweet storage/day:      500M x 300B = 150GB
Timeline reads/day:     400M users x 5 reads = 2B reads/day
Timeline read QPS:      ~23K QPS (avg), ~50K peak
Tweet write QPS:        ~5.8K (avg), ~12K peak
Avg followers:          200
Fan-out volume:         500M tweets x 200 = 100B fan-out ops/day
Media tweets:           ~20% with images = 100M/day x 1MB = 100TB/day

Resource	Estimate
Tweet write QPS	~6K
Timeline read QPS	~23K
Fan-out ops/day	~100B
Storage/year (text)	~55TB

3. High-Level Architecture

+--------+         +-----------+
| Client |-------->| API       |
| (App)  |         | Gateway   |
+--------+         +-----+-----+
                         |
         +---------------+---------------+
         |               |               |
   +-----+-----+  +-----+-----+  +------+------+
   | Tweet     |  | Timeline  |  | User        |
   | Service   |  | Service   |  | Service     |
   +-----+-----+  +-----+-----+  +------+------+
         |               |               |
   +-----+-----+  +-----+-----+  +------+------+
   | Tweet DB  |  | Timeline  |  | Social      |
   | (MySQL    |  | Cache     |  | Graph       |
   |  sharded) |  | (Redis)   |  | (MySQL/     |
   +-----------+  +-----------+  |  Graph DB)  |
                                 +-------------+
         |
   +-----+------+
   | Fan-out    |
   | Service    |
   | (Workers)  |
   +-----+------+
         |
   +-----+------+
   | Kafka      |
   +------------+

4. Post Tweet Flow

Client                Tweet Service           Fan-out Service
  |                        |                        |
  |-- POST /tweet -------->|                        |
  |   { text, media_ids }  |                        |
  |                        |-- Validate ----------->|
  |                        |-- Store in Tweet DB -->|
  |                        |-- Enqueue to Kafka --->|
  |                        |                        |
  |<-- 201 Created --------|                        |
  |   { tweet_id }         |                        |
  |                        |                 +------+------+
  |                        |                 | Read poster |
  |                        |                 | follower    |
  |                        |                 | list        |
  |                        |                 +------+------+
  |                        |                        |
  |                        |                 +------+------+
  |                        |                 | For each    |
  |                        |                 | follower:   |
  |                        |                 | LPUSH to    |
  |                        |                 | timeline    |
  |                        |                 | cache       |
  |                        |                 +-------------+

Tweet Storage Schema

sql
tweets:
  tweet_id      BIGINT PRIMARY KEY  (Snowflake ID)
  user_id       BIGINT
  content       VARCHAR(280)
  media_urls    JSON
  reply_to      BIGINT NULL
  retweet_of    BIGINT NULL
  created_at    TIMESTAMP
  like_count    INT DEFAULT 0
  retweet_count INT DEFAULT 0
  reply_count   INT DEFAULT 0

INDEX idx_user_time ON tweets(user_id, created_at DESC)

5. Home Timeline: The Core Challenge

Fan-Out on Write (Push) -- Default for Normal Users

User A (200 followers) posts tweet T1
       |
       v
Fan-out workers push T1 to 200 followers' timeline caches:

timeline:follower_1 = [T1, T5, T3, T8, ...]  (most recent first)
timeline:follower_2 = [T1, T7, T2, T9, ...]
...
timeline:follower_200 = [T1, ...]

Reading timeline: Just fetch from cache -- O(1) lookup.

GET /timeline --> Redis LRANGE timeline:{user_id} 0 49 --> 50 tweet IDs
                  |
                  v
            Hydrate: fetch full tweet objects by ID (multi-get)
                  |
                  v
            Return rendered timeline

Fan-Out on Read (Pull) -- For Celebrities

User opens timeline:
  1. Fetch pre-computed timeline (from push model) = base feed
  2. Fetch recent tweets from followed celebrities
  3. Merge and sort by timestamp
  4. Return top N

Hybrid Model (Twitter's Actual Approach)

                Is the poster a celebrity?
                (> 100K followers)
                       |
                +------+------+
                |             |
               YES            NO
                |             |
          Skip fan-out   Fan-out to all
          (too expensive) followers' caches
                |             |
                v             v
          Pulled at       Pre-cached in
          read time       timeline cache

Timeline Read:
  1. Get cached timeline (pushed tweets)           --> [T1, T3, T5]
  2. Get followed celebrities' recent tweets (pull) --> [T2, T4]
  3. Merge by timestamp                             --> [T1, T2, T3, T4, T5]
  4. Return top 50

Interview Tip: This hybrid approach is THE key insight for Twitter design. Demonstrate you understand why pure push fails for celebrities (Lady Gaga with 80M followers = 80M writes per tweet).

6. User Timeline (Simple)

GET /users/{user_id}/tweets

Query: SELECT * FROM tweets
       WHERE user_id = {user_id}
       ORDER BY created_at DESC
       LIMIT 50 OFFSET 0

Optimization:
  - Index on (user_id, created_at DESC)
  - Cache recent tweets per user in Redis
  - No fan-out needed -- just a filtered query

7. Search: Inverted Index on Tweets

Architecture

Tweet Created --> Kafka --> Search Indexer --> Elasticsearch
                                                   |
User Search Query --> Search Service --> ES Query --+
                                                   |
                                              Ranked Results

Real-Time Search Index

Tweet: "Just landed in Paris! #travel #europe"

Tokens: ["just", "landed", "paris", "travel", "europe"]

Inverted Index Update:
  "paris"  -> [..., tweet_98765]
  "travel" -> [..., tweet_98765]
  "europe" -> [..., tweet_98765]

Search Ranking

Signal	Weight
Text relevance (BM25)	High
Recency	High (tweets decay fast)
Engagement (likes, retweets)	Medium
Author verified/authority	Medium
User personalization	Low

Early Termination

For real-time search, only index last 7 days of tweets
Archive older tweets to cold index (searched separately)
Twitter indexes ~500M tweets/day with sub-second latency

8. Trending Topics

Architecture

Tweet Stream (Kafka)
       |
       v
+------+----------+
| Stream Processor |
| (Flink / Storm)  |
+------+----------+
       |
  1. Extract hashtags and entities
  2. Count occurrences in sliding windows
  3. Detect anomalies (sudden spikes)
       |
       v
+------+----------+
| Trending Topics |
| Cache (Redis)   |
+------+----------+
       |
       v
  Served to clients

Count-Min Sketch for Efficient Counting

Problem: Counting exact occurrences of millions of hashtags in real-time
         requires too much memory.

Count-Min Sketch:
  - Probabilistic data structure
  - Uses multiple hash functions and a 2D array
  - Space: O(1/epsilon * log(1/delta))
  - Overestimates counts slightly, never underestimates

     h1  h2  h3  (hash functions)
     |   |   |
  +--+---+---+--+
  | 5 | 0 | 3 | 1 |  row 1 (h1)
  +--+---+---+--+
  | 2 | 7 | 0 | 4 |  row 2 (h2)
  +--+---+---+--+
  | 1 | 3 | 8 | 0 |  row 3 (h3)
  +--+---+---+--+

  count("#travel") = min(row1[h1(travel)], row2[h2(travel)], row3[h3(travel)])

Sliding Window

Window: 1 hour, slide every 5 minutes

  |------ window 1 ------|
       |------ window 2 ------|
            |------ window 3 ------|

For each window:
  Count hashtag occurrences
  Compare to historical baseline
  If count >> baseline: trending

Trending Score

trending_score = (current_count - baseline_count) / baseline_count

Example:
  #WorldCup: current_count = 500K/hr, baseline = 10K/hr
  score = (500K - 10K) / 10K = 49.0  --> TRENDING

  #goodmorning: current_count = 50K/hr, baseline = 45K/hr
  score = (50K - 45K) / 45K = 0.11  --> NOT trending (normal volume)

Interview Tip: The key insight is that trending is about velocity of change, not absolute volume. A hashtag that suddenly spikes is trending even if its total count is modest.

9. Follow/Unfollow Mechanics

Social Graph Storage

sql
follows:
  follower_id    BIGINT
  followee_id    BIGINT
  created_at     TIMESTAMP
  PRIMARY KEY (follower_id, followee_id)

INDEX idx_followee ON follows(followee_id)

Operations

Operation	Query
Follow	INSERT INTO follows (follower_id, followee_id)
Unfollow	DELETE FROM follows WHERE follower_id=X AND followee_id=Y
Get following	SELECT followee_id FROM follows WHERE follower_id=X
Get followers	SELECT follower_id FROM follows WHERE followee_id=X
Follower count	SELECT COUNT(*) ... (or denormalized counter)

Unfollow and Timeline Cache

When user A unfollows user B:

Delete from follows table
Async: remove B's tweets from A's timeline cache
Or: lazy filter -- skip B's tweets when rendering A's timeline

10. Like/Retweet Counters

Challenge

Popular tweets get millions of likes. UPDATE likes SET count = count + 1 creates hot row contention.

Solution: Sharded Counters

tweet_12345_likes:
  shard_0: 15,234
  shard_1: 14,892
  shard_2: 15,108
  shard_3: 14,766

total_likes = sum(all shards) = 60,000

Write: randomly pick a shard, increment
Read:  sum all shards (cached with 30s TTL)

Like/Unlike with Dedup

sql
likes:
  user_id     BIGINT
  tweet_id    BIGINT
  created_at  TIMESTAMP
  PRIMARY KEY (user_id, tweet_id)

-- Like: INSERT (idempotent with PK constraint)
-- Unlike: DELETE
-- Check if liked: SELECT EXISTS WHERE user_id=X AND tweet_id=Y

11. Media Attachments

Tweet with Image:
  Client --> Upload Service --> S3
                |
                v
          Generate:
            - Thumbnail (150x150)
            - Small (680px wide)
            - Large (1200px wide)
                |
                v
          Return media_id

  Client --> POST /tweet { text: "...", media_ids: ["m_123"] }
                |
                v
          Store tweet with media_urls

Media Storage

Type	Max Size	Processing
Image	5MB	Resize, compress, WebP/AVIF
GIF	15MB	Convert to MP4 for efficiency
Video	512MB	Transcode to HLS segments

12. Notification System

Events (Kafka):
  - mention: "@user was mentioned in a tweet"
  - like: "user liked your tweet"
  - retweet: "user retweeted your tweet"
  - follow: "user followed you"
  - reply: "user replied to your tweet"
       |
       v
+------+----------+
| Notification    |
| Service         |
+------+----------+
       |
  +----+----+----+
  |         |    |
  v         v    v
In-App    Push   Email
(badge,   (APNs, (digest)
 bell)    FCM)

Notification Aggregation

Instead of:
  "Alice liked your tweet"
  "Bob liked your tweet"
  "Charlie liked your tweet"

Show:
  "Alice, Bob, and 47 others liked your tweet"

Aggregate within time windows (5 minutes)
Group by tweet + action type
Show count + sample names

13. Analytics

Real-Time Metrics

Tweet Impressions Pipeline:
  Client view event --> Kafka --> Flink --> Metrics Store (Druid/ClickHouse)
                                              |
                                      Tweet analytics dashboard:
                                        - Impressions
                                        - Engagements
                                        - Link clicks
                                        - Profile visits

Batch Analytics

Daily aggregation via Spark jobs
User growth, engagement trends
Content moderation signals (spam detection)

14. Complete System Diagram

+---------+                                   +---------+
| Client  |                                   | Client  |
| (Read)  |                                   | (Write) |
+----+----+                                   +----+----+
     |                                             |
     v                                             v
+----+-----------+                          +------+------+
|   CDN          |                          | API Gateway |
| (media, static)|                          +------+------+
+----------------+                                 |
                                    +--------------+---------------+
                                    |              |               |
                              +-----+----+  +------+-----+  +-----+-----+
                              | Tweet    |  | Timeline   |  | Search    |
                              | Service  |  | Service    |  | Service   |
                              +-----+----+  +------+-----+  +-----+-----+
                                    |              |               |
                              +-----+----+  +------+-----+  +-----+-----+
                              | Tweet DB |  | Timeline   |  | Elastic-  |
                              | (MySQL   |  | Cache      |  | search    |
                              |  sharded)|  | (Redis)    |  +-----------+
                              +----------+  +------+-----+
                                    |              ^
                              +-----+----+         |
                              | Kafka    +---------+
                              +--+--+--+-+
                                 |  |  |
                      +----------+  |  +----------+
                      |             |             |
               +------+---+  +-----+----+  +-----+------+
               | Fan-out  |  | Search   |  | Trending   |
               | Workers  |  | Indexer  |  | Service    |
               +----------+  +----------+  | (Flink)    |
                                           +-----+------+
                                                 |
                                           +-----+------+
                                           | Trending   |
                                           | Cache      |
                                           +------------+

+------------------+  +------------------+  +------------------+
| Social Graph DB  |  | Notification Svc |  | Analytics        |
| (follows)        |  | (Kafka + Push)   |  | (Spark + Druid)  |
+------------------+  +------------------+  +------------------+

15. Interview Tips

Hybrid fan-out is THE answer: Pure push fails for celebrities, pure pull is too slow
Timeline cache in Redis: Pre-computed for fast reads, key data structure
Trending = velocity, not volume: Count-min sketch + sliding window
Search is real-time: Tweets indexed within seconds, last 7 days hot index
Snowflake IDs: Mention Twitter's ID generation (time-ordered, distributed)
Sharded counters: For like/retweet counts on viral tweets
Read-heavy optimization: Cache aggressively, timeline is the hot path
Don't conflate home timeline and user timeline: They have completely different data paths

16. Resources

Alex Xu - System Design Interview Vol. 1, Chapter 11 (also applicable)
Twitter Engineering Blog: "The Infrastructure Behind Twitter: Scale"
"How Twitter Uses Redis to Scale" (RedisConf talk)
Twitter Snowflake: Distributed ID generation
"Timelines at Scale" (QCon talk by Raffi Krikorian)
Count-Min Sketch paper (Cormode & Muthukrishnan, 2005)
Martin Kleppmann - Designing Data-Intensive Applications, Chapter 11 (Stream Processing)

Previous: 35 - Design Uber or Lyft | Next: 37 - Design Dropbox or Google Drive