31 - Design Chat System
Series: System Design & Distributed Systems Previous: 30 - Design Notification System | Next: 32 - Design News Feed
1. Requirements
Functional Requirements
| Feature | Details |
|---|---|
| 1:1 chat | Real-time messaging between two users |
| Group chat | Up to 500 members per group |
| Online presence | Show online/offline/last-seen status |
| Read receipts | Delivered, read indicators |
| Media sharing | Images, video, files up to 100MB |
| Message history | Persistent, searchable |
| Push notifications | For offline users |
Non-Functional Requirements
- Latency: < 100ms message delivery for online users
- Availability: 99.99% uptime
- Ordering: Messages appear in correct order per conversation
- Durability: Zero message loss
- Scale: 500M DAU, 60B messages/day (WhatsApp scale)
2. Capacity Estimation
DAU: 500M
Avg messages/user: 40/day
Total messages: 20B/day ~ 230K msg/sec
Avg message size: 100 bytes text
Storage/day: 20B x 100B = 2TB text
Media messages: ~5% = 1B/day, avg 200KB = 200TB/day
| Resource | Estimate |
|---|---|
| Write QPS | ~230K msg/sec |
| Peak QPS | ~460K msg/sec |
| Text storage/year | ~730TB |
| Media storage/year | ~73PB |
3. High-Level Architecture
+-------------------+
| Load Balancer |
+--------+----------+
|
+--------------+--------------+
| | |
+------+------+ +----+-----+ +-----+------+
| API Gateway | | API GW 2 | | API GW N |
+------+------+ +----------+ +------------+
|
+------------+-------------+
| | |
+----+----+ +----+-----+ +----+------+
| Chat | | Presence | | Media |
| Service | | Service | | Service |
+----+----+ +----+-----+ +----+------+
| | |
| +-------+-------+ |
| | User Session | |
| | Registry | |
| +----------------+ |
| |
+----+-------------------------+----+
| Message Queue |
| (Kafka / RabbitMQ) |
+----+-----+----+---+----+---------+
| | | | |
+--+--+ | +--+--+| +-+--------+
|Store| | |Push || | Fan-out |
|Svc | | |Notif || | Service |
+--+--+ | +------+| +----------+
| | |
+----+-----+----------+----+
| Message Database | +------------------+
| (Cassandra / HBase) | | Object Store |
+---------------------------+ | (S3 / GCS) |
+------------------+
4. Communication Protocol: WebSocket
Why WebSocket over HTTP?
| Feature | HTTP Polling | Long Polling | WebSocket |
|---|---|---|---|
| Latency | High (poll interval) | Medium | Low (real-time) |
| Server load | Very high | Medium | Low |
| Bidirectional | No | No | Yes |
| Connection overhead | Per request | Per timeout | Once |
Connection Flow
Client Server
|--- HTTP Upgrade Request ------>|
|<--- 101 Switching Protocols ---|
| |
|<====== Bidirectional WS ======>|
| send message |
| receive message |
| presence updates |
| typing indicators |
Interview Tip: Mention that HTTP is still used for non-real-time operations like profile updates, media uploads, and login. Only chat messaging uses WebSocket.
5. Message Flow: Send, Store, Deliver, Acknowledge
Sender Chat Server Database Receiver
| | | |
|-- 1. Send msg ----->| | |
| |-- 2. Store msg ---->| |
| |<-- 3. ACK stored ---| |
|<-- 4. ACK sent -----| | |
| | | |
| |-- 5. Route to receiver's server ---->|
| | | |
| | 6. Is receiver online?
| | | |
| | [ONLINE] --------->|-- 7. Push WS ->|
| | | |
| | [OFFLINE] -------->| 8. Push Notif |
| | | |
| |<-------- 9. Delivered ACK -----------|
| | | |
|<-- 10. Delivered ---| | |
| |<-------- 11. Read ACK ---------------|
|<-- 12. Read --------| | |
Message States
SENT --> DELIVERED --> READ
| |
+-> FAILED +-> EXPIRED (for ephemeral)
6. Chat Server Architecture
User Session Registry
Tracks which chat server each user is connected to.
+---------------------------+
| User Session Registry |
| (Redis Cluster) |
+---------------------------+
| user_id | server_id | ws |
|---------|-----------|-----|
| U1 | CS-3 | ws7 |
| U2 | CS-1 | ws2 |
| U3 | CS-3 | ws9 |
+---------------------------+
When a message arrives for U2, the system looks up the registry to find U2 is on CS-1, then routes the message there.
Scaling Chat Servers
- Each server holds ~50K-100K concurrent WebSocket connections
- Stateless routing: any server can handle any user
- Session registry in Redis provides location lookup
- Consistent hashing or random assignment for initial connection
7. Message Storage
Option A: Per-User Inbox (Write-Heavy)
inbox_user_123:
{ msg_id: M1, from: U456, text: "hello", ts: 1700000001 }
{ msg_id: M2, from: U789, text: "hey", ts: 1700000005 }
- Pros: Fast reads for a user's messages
- Cons: Group messages duplicated N times (one per member)
Option B: Message Table + Conversation Index (Balanced)
messages table:
msg_id | conversation_id | sender_id | content | timestamp
-------|-----------------|-----------|---------|----------
M1 | conv_AB | A | hello | t1
M2 | conv_AB | B | hi | t2
conversation_index table:
user_id | conversation_id | last_read_msg_id | unread_count
--------|-----------------|------------------|------------
A | conv_AB | M2 | 0
B | conv_AB | M1 | 1
- Pros: No duplication, efficient for group chat
- Cons: Requires join or secondary index for user's view
Recommended: Hybrid
- Use a wide-column store (Cassandra/HBase) partitioned by
conversation_id - Partition key:
conversation_id, clustering key:timestamp - Separate sync table per user for unread tracking
Interview Tip: WhatsApp uses custom Erlang + Mnesia. For interviews, Cassandra is the standard answer -- it handles high write throughput and time-series data well.
8. Group Chat Fan-Out
Small Groups (< 500 members)
Sender --> Chat Server --> Fan-out Service
|
+-----------+-----------+
| | |
Member 1 Member 2 Member N
(online: (online: (offline:
WS push) WS push) push notif)
- Store message once in group conversation
- Fan-out delivery to each member via their chat server
- Push notification for offline members
Why Not Fan-Out on Write for Large Groups?
For groups with 500 members, writing 500 copies per message is expensive. Instead:
- Store once, fan-out on delivery only
- Offline members pull on reconnect (lazy loading)
9. Online Presence (Heartbeat)
Client --> Heartbeat every 5s --> Presence Service --> Redis
|
+--------+--------+
| |
Set TTL = 30s Pub/Sub to
in Redis friends' clients
Presence States
| State | Condition |
|---|---|
| Online | Heartbeat received within TTL |
| Offline | No heartbeat past TTL expiry |
| Away | No app interaction > 5 min, heartbeat still active |
| Last seen | Timestamp of last heartbeat before going offline |
Optimization for Large Friend Lists
- Don't push presence to all friends immediately
- Use lazy evaluation: check presence when user opens a chat
- For group chats: only fetch presence of visible members
Interview Tip: Facebook designed a presence system that batches updates and uses a fan-out budget -- each user's presence change fans out to at most N friends.
10. End-to-End Encryption (E2EE)
Alice Server Bob
| | |
| Generate key pair | Generate key pair |
| (public + private) | (public + private) |
| | |
|-- Upload public key --->|<--- Upload public key ---|
| | |
|<-- Bob's public key ----|---- Alice's public key ->|
| | |
| Encrypt with | Decrypt with |
| Bob's public key | Bob's private |
|-- Encrypted msg ------->|--------- Encrypted msg ->|
| | |
| Server CANNOT read | |
- Signal Protocol (used by WhatsApp, Signal)
- Double Ratchet Algorithm for forward secrecy
- Server stores only ciphertext
11. Message Ordering
Challenge
Messages from different senders may arrive out of order due to network delays.
Solution: Per-Conversation Sequence Numbers
conversation_id | sequence_num | msg_id | sender | timestamp
----------------|--------------|--------|--------|----------
conv_AB | 1 | M1 | A | t1
conv_AB | 2 | M2 | B | t2
conv_AB | 3 | M3 | A | t3
- Each conversation has a monotonically increasing sequence counter
- Chat server assigns sequence number atomically
- Client sorts by sequence number, not timestamp
- For 1:1 chats, server clock ordering is sufficient
- For group chats, use a centralized sequencer per conversation
12. Push Notifications for Offline Users
Chat Server --> Message Queue --> Push Notification Service
|
+------------+------------+
| | |
APNs FCM Web Push
(iOS) (Android) (Browser)
Flow
- Chat server checks user session registry
- User not connected --> enqueue to push notification service
- Push service formats payload per platform
- Sends via APNs (Apple), FCM (Google), or Web Push
- On reconnect, client fetches undelivered messages from DB
Optimization
- Batch notifications: "You have 5 new messages from Alice"
- Rate limit: don't buzz for every message in an active group
- Silent push for data sync, audible push for direct messages
13. Media Sharing
Sender Media Service S3/Blob Store
| | |
|-- Upload media --------->| |
| |-- Store blob ----------->|
| |<-- Return CDN URL -------|
|<-- Media URL + metadata--| |
| | |
|-- Send chat message with media URL to Chat Server --|
| (normal message flow with media_url field) |
Media Handling
| Step | Detail |
|---|---|
| Upload | Client uploads to Media Service via HTTP (not WebSocket) |
| Processing | Compress, generate thumbnail, virus scan |
| Storage | S3 with CDN (CloudFront) for fast delivery |
| Message | Chat message contains media_url, not the binary |
| Download | Receiver fetches from CDN using the URL |
Optimization
- Client-side compression before upload
- Resumable uploads for large files (tus protocol)
- Progressive image loading (blur placeholder -> full image)
14. Message Search
Architecture
Messages DB --> Change Data Capture --> Elasticsearch
(Debezium / CDC)
Search Index Design
- Index by:
conversation_id,sender_id,content,timestamp - Full-text search on message content
- Filter by conversation, date range, sender
- E2EE caveat: server cannot index encrypted messages -- search happens client-side for E2EE chats
Search Flow
- User types query in search bar
- API call to Search Service
- Query Elasticsearch with filters
- Return matching messages with conversation context
- Client navigates to the message in conversation view
15. Complete System Diagram
+-------+ +-------+ +-------+
|Client | |Client | |Client |
| (WS) | | (WS) | | (WS) |
+---+---+ +---+---+ +---+---+
| | |
+------+-----+-----+-----+
| |
+------+------+ |
| L4 Load | |
| Balancer | |
+------+------+ |
| |
+--------+--------+ | +------------------+
| Chat Server | | | API Gateway |
| Cluster (WS) | | | (HTTP REST) |
+---+----+----+---+ | +---+----+---------+
| | | | | |
| | +-------+----------+ |
| | | |
+----+----+---+ +-----+-------+ +---+----------+
| User Session| | Presence | | Media |
| Registry | | Service | | Service |
| (Redis) | | (Redis+Pub) | | (HTTP upload)|
+---------+---+ +------+------+ +---+----------+
| | |
+-----+----+ +---+---+ +----+-----+
| Kafka / | | Redis | | S3 + CDN |
| Msg Queue| +-------+ +----------+
+---+------+
|
+------+------+------+
| | | |
+-+-+ +--+-+ +--+-+ +--+--+
|DB | |Push| |Fan | |Search|
|Svc| |Ntf | |Out | | Svc |
+---+ +----+ +----+ +--+--+
| |
+-+-----------+ +----+--------+
| Cassandra / | |Elasticsearch|
| HBase | +-------------+
+-------------+
16. Interview Tips
- Start with requirements: Clarify 1:1 vs group, scale, E2EE needs
- WebSocket is the answer for real-time: but explain HTTP for other operations
- Message ordering: Per-conversation sequence numbers, not global clocks
- Storage choice matters: Cassandra/HBase for high-write chat workloads, not MySQL
- Fan-out strategy: Distinguish between small groups (fan-out on delivery) and large groups (lazy pull)
- Don't forget: Read receipts, typing indicators, presence -- these are differentiators
- E2EE is a trade-off: Server-side search becomes impossible
17. Resources
- Alex Xu - System Design Interview Vol. 1, Chapter 12: Design a Chat System
- WhatsApp Architecture: Erlang + Mnesia + FreeBSD tuning
- Facebook Messenger Architecture (2015 engineering blog)
- Signal Protocol documentation (signal.org/docs)
- Discord Engineering Blog: How Discord Stores Billions of Messages
- Martin Kleppmann - Designing Data-Intensive Applications, Chapter 11 (Stream Processing)
Previous: 30 - Design Notification System | Next: 32 - Design News Feed