Message Queues & Event Streaming
Why This Matters
Asynchronous messaging is how large-scale systems decouple services, handle spikes, and enable event-driven architectures. Every FAANG system uses message queues.
Core Concepts
Synchronous vs Asynchronous
Synchronous (tight coupling):
Order Service → Payment Service → Inventory Service → Notification Service
(each waits for the next, one failure breaks the chain)
Asynchronous (loose coupling):
Order Service → Message Queue → Payment Service (consumes when ready)
→ Inventory Service (consumes when ready)
→ Notification Service (consumes when ready)
Benefits of Async Messaging
- Decoupling — producer doesn't know/care about consumers
- Buffering — absorb traffic spikes
- Reliability — messages persist until processed
- Scalability — add more consumers independently
- Failure isolation — consumer failure doesn't affect producer
Message Queue Patterns
Point-to-Point (Queue)
Producer → Queue → Consumer (one consumer per message)
- Competing consumers — multiple consumers pull from same queue
- Message removed after acknowledgment
- Use for: task distribution, work queues, job processing
Publish/Subscribe (Topic)
Publisher → Topic → Subscriber A (gets all messages)
→ Subscriber B (gets all messages)
→ Subscriber C (gets all messages)
- One message delivered to ALL subscribers
- Use for: event notification, fan-out, broadcasting
Consumer Groups (Kafka-style)
Producer → Topic (3 partitions) → Consumer Group A (3 consumers → 1 per partition)
→ Consumer Group B (2 consumers)
- Within a group: each partition consumed by exactly one consumer
- Across groups: each group gets all messages
- Combines queue semantics (within group) with pub/sub (across groups)
Key Technologies
Apache Kafka
Distributed event streaming platform — the industry standard.
Architecture:
Producers → Kafka Cluster (Brokers) → Consumers
↓
Topics → Partitions → Segments (on disk)
↓
ZooKeeper / KRaft (metadata management)
Key Concepts:
- Topic — logical channel for events
- Partition — ordered, immutable append-only log within a topic
- Offset — position of a message within a partition
- Consumer group — set of consumers that share partitions
- Broker — Kafka server that stores partitions
- Replication factor — copies of each partition (typically 3)
Why Kafka is special:
- Persistent log — messages stored on disk, retained for days/weeks/forever
- Replay — consumers can re-read from any offset
- High throughput — millions of messages/sec (sequential disk I/O)
- Ordering — guaranteed within a partition
- Exactly-once semantics (with idempotent producer + transactional consumer)
When to use Kafka:
- Event streaming, log aggregation, metrics pipelines
- Event sourcing, CQRS
- Real-time analytics
- Data integration between systems
- Change Data Capture (CDC)
RabbitMQ
Traditional message broker with rich routing.
Key Features:
- AMQP protocol — standardized messaging
- Exchange types: Direct, Fanout, Topic, Headers
- Message acknowledgment — consumer confirms processing
- Dead Letter Queue — failed messages routed to DLQ
- Priority queues — process urgent messages first
- Flexible routing — route by header, key, pattern
When to use RabbitMQ:
- Complex routing requirements
- Task queues with priority
- RPC over messaging
- When you need per-message acknowledgment
- Smaller scale than Kafka
Amazon SQS
Fully managed queue service.
- Standard queue: at-least-once delivery, best-effort ordering, unlimited throughput
- FIFO queue: exactly-once processing, strict ordering, 3,000 msg/s
- Dead Letter Queue — automatic move after N failures
- Visibility timeout — message invisible to others while being processed
- Long polling — reduce empty responses
Comparison
| Feature | Kafka | RabbitMQ | SQS |
|---|---|---|---|
| Model | Log-based streaming | Traditional broker | Managed queue |
| Throughput | Very high (millions/s) | Medium (tens of K/s) | High (unlimited*) |
| Ordering | Per partition | Per queue | FIFO or best-effort |
| Retention | Configurable (days to forever) | Until consumed | 14 days max |
| Replay | Yes (seek to any offset) | No | No |
| Routing | Partition key | Complex exchange routing | Simple queue |
| Operations | Complex (brokers, ZK) | Medium | Zero (managed) |
| Use case | Streaming, pipelines | Task queues, routing | Simple async tasks |
Message Delivery Guarantees
| Guarantee | Meaning | How | Trade-off |
|---|---|---|---|
| At-most-once | Message may be lost, never duplicated | Fire and forget, no ack | Fastest, lossy |
| At-least-once | Message never lost, may be duplicated | Ack after processing, retry on failure | Most common |
| Exactly-once | Message processed exactly once | Idempotent processing + transactional commits | Slowest, most complex |
Interview tip: "At-least-once delivery with idempotent consumers" is the standard answer. True exactly-once is expensive and rarely needed if consumers are idempotent.
Dead Letter Queues (DLQ)
Main Queue → Consumer → Fails 3 times → Dead Letter Queue
↓
Manual inspection or
automated retry later
Why DLQ matters:
- Prevents poison messages from blocking the queue
- Allows investigation of failures
- Can set up alerts on DLQ growth
- Enables manual or automated retry
Event-Driven Architecture
Event Notification
Order Service publishes: "OrderCreated"
→ Payment Service: charges card
→ Inventory Service: reserves stock
→ Email Service: sends confirmation
Producers don't know about consumers. Add new consumers without changing producers.
Event Sourcing
Instead of storing current state, store all events:
Events:
1. AccountCreated { id: "123", name: "Zineddine" }
2. FundsDeposited { id: "123", amount: 1000 }
3. FundsWithdrawn { id: "123", amount: 200 }
Current state (derived): balance = 800
Pros: Complete audit trail, can rebuild state, temporal queries Cons: Complex queries, eventual consistency, storage growth
CQRS (Command Query Responsibility Segregation)
Commands (writes) → Write Model (normalized) → Events → Read Model (denormalized)
Queries (reads) → Read Model (optimized for queries)
Why: Optimize read and write models independently When: Read and write patterns are very different (read-heavy with complex queries)
Backpressure
Problem
Producer is faster than consumer → queue grows unbounded → OOM
Solutions
- Bounded queue — reject/block when full
- Rate limiting — limit producer speed
- Consumer scaling — auto-scale consumers based on queue depth
- Sampling — drop or sample messages under load
- Kafka approach — consumer controls read speed (pull-based)
Interview Patterns
Async Processing
User → API → Queue → Worker → DB
↓
Return 202 Accepted
(provide status endpoint: GET /tasks/{id})
Fan-Out
User posts → Fan-out service → Write to all followers' feeds
(use queue to handle millions of writes)
Saga Pattern (distributed transactions)
Order → Payment → Inventory → Shipping
↓ (if any fails)
Compensating transactions roll back previous steps
Resources
- 📖 DDIA Chapter 11: Stream Processing
- 📖 "Kafka: The Definitive Guide" by Neha Narkhede
- 📖 "Enterprise Integration Patterns" by Hohpe & Woolf
- 🔗 Kafka documentation
- 🎥 Martin Kleppmann — Event Sourcing
- 🔗 AWS SQS Best Practices
Previous: 07 - Databases Deep Dive | Next: 09 - Consistent Hashing & Data Partitioning