Message Queues & Event Streaming

Why This Matters

Asynchronous messaging is how large-scale systems decouple services, handle spikes, and enable event-driven architectures. Every FAANG system uses message queues.

Core Concepts

Synchronous vs Asynchronous

Synchronous (tight coupling):

Order Service → Payment Service → Inventory Service → Notification Service
(each waits for the next, one failure breaks the chain)

Asynchronous (loose coupling):

Order Service → Message Queue → Payment Service (consumes when ready)
                              → Inventory Service (consumes when ready)
                              → Notification Service (consumes when ready)

Benefits of Async Messaging

Decoupling — producer doesn't know/care about consumers
Buffering — absorb traffic spikes
Reliability — messages persist until processed
Scalability — add more consumers independently
Failure isolation — consumer failure doesn't affect producer

Message Queue Patterns

Point-to-Point (Queue)

Producer → Queue → Consumer (one consumer per message)

Competing consumers — multiple consumers pull from same queue
Message removed after acknowledgment
Use for: task distribution, work queues, job processing

Publish/Subscribe (Topic)

Publisher → Topic → Subscriber A (gets all messages)
                  → Subscriber B (gets all messages)
                  → Subscriber C (gets all messages)

One message delivered to ALL subscribers
Use for: event notification, fan-out, broadcasting

Consumer Groups (Kafka-style)

Producer → Topic (3 partitions) → Consumer Group A (3 consumers → 1 per partition)
                                 → Consumer Group B (2 consumers)

Within a group: each partition consumed by exactly one consumer
Across groups: each group gets all messages
Combines queue semantics (within group) with pub/sub (across groups)

Key Technologies

Apache Kafka

Distributed event streaming platform — the industry standard.

Architecture:

Producers → Kafka Cluster (Brokers) → Consumers
              ↓
        Topics → Partitions → Segments (on disk)
              ↓
         ZooKeeper / KRaft (metadata management)

Key Concepts:

Topic — logical channel for events
Partition — ordered, immutable append-only log within a topic
Offset — position of a message within a partition
Consumer group — set of consumers that share partitions
Broker — Kafka server that stores partitions
Replication factor — copies of each partition (typically 3)

Why Kafka is special:

Persistent log — messages stored on disk, retained for days/weeks/forever
Replay — consumers can re-read from any offset
High throughput — millions of messages/sec (sequential disk I/O)
Ordering — guaranteed within a partition
Exactly-once semantics (with idempotent producer + transactional consumer)

When to use Kafka:

Event streaming, log aggregation, metrics pipelines
Event sourcing, CQRS
Real-time analytics
Data integration between systems
Change Data Capture (CDC)

RabbitMQ

Traditional message broker with rich routing.

Key Features:

AMQP protocol — standardized messaging
Exchange types: Direct, Fanout, Topic, Headers
Message acknowledgment — consumer confirms processing
Dead Letter Queue — failed messages routed to DLQ
Priority queues — process urgent messages first
Flexible routing — route by header, key, pattern

When to use RabbitMQ:

Complex routing requirements
Task queues with priority
RPC over messaging
When you need per-message acknowledgment
Smaller scale than Kafka

Amazon SQS

Fully managed queue service.

Standard queue: at-least-once delivery, best-effort ordering, unlimited throughput
FIFO queue: exactly-once processing, strict ordering, 3,000 msg/s
Dead Letter Queue — automatic move after N failures
Visibility timeout — message invisible to others while being processed
Long polling — reduce empty responses

Comparison

Feature	Kafka	RabbitMQ	SQS
Model	Log-based streaming	Traditional broker	Managed queue
Throughput	Very high (millions/s)	Medium (tens of K/s)	High (unlimited*)
Ordering	Per partition	Per queue	FIFO or best-effort
Retention	Configurable (days to forever)	Until consumed	14 days max
Replay	Yes (seek to any offset)	No	No
Routing	Partition key	Complex exchange routing	Simple queue
Operations	Complex (brokers, ZK)	Medium	Zero (managed)
Use case	Streaming, pipelines	Task queues, routing	Simple async tasks

Message Delivery Guarantees

Guarantee	Meaning	How	Trade-off
At-most-once	Message may be lost, never duplicated	Fire and forget, no ack	Fastest, lossy
At-least-once	Message never lost, may be duplicated	Ack after processing, retry on failure	Most common
Exactly-once	Message processed exactly once	Idempotent processing + transactional commits	Slowest, most complex

Interview tip: "At-least-once delivery with idempotent consumers" is the standard answer. True exactly-once is expensive and rarely needed if consumers are idempotent.

Dead Letter Queues (DLQ)

Main Queue → Consumer → Fails 3 times → Dead Letter Queue
                                            ↓
                                     Manual inspection or
                                     automated retry later

Why DLQ matters:

Prevents poison messages from blocking the queue
Allows investigation of failures
Can set up alerts on DLQ growth
Enables manual or automated retry

Event-Driven Architecture

Event Notification

Order Service publishes: "OrderCreated"
  → Payment Service: charges card
  → Inventory Service: reserves stock
  → Email Service: sends confirmation

Producers don't know about consumers. Add new consumers without changing producers.

Event Sourcing

Instead of storing current state, store all events:

Events:
  1. AccountCreated { id: "123", name: "Zineddine" }
  2. FundsDeposited { id: "123", amount: 1000 }
  3. FundsWithdrawn { id: "123", amount: 200 }

Current state (derived): balance = 800

Pros: Complete audit trail, can rebuild state, temporal queries Cons: Complex queries, eventual consistency, storage growth

CQRS (Command Query Responsibility Segregation)

Commands (writes) → Write Model (normalized) → Events → Read Model (denormalized)
Queries (reads)   → Read Model (optimized for queries)

Why: Optimize read and write models independently When: Read and write patterns are very different (read-heavy with complex queries)

Backpressure

Problem

Producer is faster than consumer → queue grows unbounded → OOM

Solutions

Bounded queue — reject/block when full
Rate limiting — limit producer speed
Consumer scaling — auto-scale consumers based on queue depth
Sampling — drop or sample messages under load
Kafka approach — consumer controls read speed (pull-based)

Interview Patterns

Async Processing

User → API → Queue → Worker → DB
           ↓
     Return 202 Accepted
     (provide status endpoint: GET /tasks/{id})

Fan-Out

User posts → Fan-out service → Write to all followers' feeds
           (use queue to handle millions of writes)

Saga Pattern (distributed transactions)

Order → Payment → Inventory → Shipping
  ↓ (if any fails)
Compensating transactions roll back previous steps

Resources

📖 DDIA Chapter 11: Stream Processing
📖 "Kafka: The Definitive Guide" by Neha Narkhede
📖 "Enterprise Integration Patterns" by Hohpe & Woolf
🔗 Kafka documentation
🎥 Martin Kleppmann — Event Sourcing
🔗 AWS SQS Best Practices

Previous: 07 - Databases Deep Dive | Next: 09 - Consistent Hashing & Data Partitioning