Message Queues & Event Streaming

Why This Matters

Asynchronous messaging is how large-scale systems decouple services, handle spikes, and enable event-driven architectures. Every FAANG system uses message queues.


Core Concepts

Synchronous vs Asynchronous

Synchronous (tight coupling):

Order Service → Payment Service → Inventory Service → Notification Service
(each waits for the next, one failure breaks the chain)

Asynchronous (loose coupling):

Order Service → Message Queue → Payment Service (consumes when ready)
                              → Inventory Service (consumes when ready)
                              → Notification Service (consumes when ready)

Benefits of Async Messaging

  • Decoupling — producer doesn't know/care about consumers
  • Buffering — absorb traffic spikes
  • Reliability — messages persist until processed
  • Scalability — add more consumers independently
  • Failure isolation — consumer failure doesn't affect producer

Message Queue Patterns

Point-to-Point (Queue)

Producer → Queue → Consumer (one consumer per message)
  • Competing consumers — multiple consumers pull from same queue
  • Message removed after acknowledgment
  • Use for: task distribution, work queues, job processing

Publish/Subscribe (Topic)

Publisher → Topic → Subscriber A (gets all messages)
                  → Subscriber B (gets all messages)
                  → Subscriber C (gets all messages)
  • One message delivered to ALL subscribers
  • Use for: event notification, fan-out, broadcasting

Consumer Groups (Kafka-style)

Producer → Topic (3 partitions) → Consumer Group A (3 consumers → 1 per partition)
                                 → Consumer Group B (2 consumers)
  • Within a group: each partition consumed by exactly one consumer
  • Across groups: each group gets all messages
  • Combines queue semantics (within group) with pub/sub (across groups)

Key Technologies

Apache Kafka

Distributed event streaming platform — the industry standard.

Architecture:

Producers → Kafka Cluster (Brokers) → Consumers
              ↓
        Topics → Partitions → Segments (on disk)
              ↓
         ZooKeeper / KRaft (metadata management)

Key Concepts:

  • Topic — logical channel for events
  • Partition — ordered, immutable append-only log within a topic
  • Offset — position of a message within a partition
  • Consumer group — set of consumers that share partitions
  • Broker — Kafka server that stores partitions
  • Replication factor — copies of each partition (typically 3)

Why Kafka is special:

  • Persistent log — messages stored on disk, retained for days/weeks/forever
  • Replay — consumers can re-read from any offset
  • High throughput — millions of messages/sec (sequential disk I/O)
  • Ordering — guaranteed within a partition
  • Exactly-once semantics (with idempotent producer + transactional consumer)

When to use Kafka:

  • Event streaming, log aggregation, metrics pipelines
  • Event sourcing, CQRS
  • Real-time analytics
  • Data integration between systems
  • Change Data Capture (CDC)

RabbitMQ

Traditional message broker with rich routing.

Key Features:

  • AMQP protocol — standardized messaging
  • Exchange types: Direct, Fanout, Topic, Headers
  • Message acknowledgment — consumer confirms processing
  • Dead Letter Queue — failed messages routed to DLQ
  • Priority queues — process urgent messages first
  • Flexible routing — route by header, key, pattern

When to use RabbitMQ:

  • Complex routing requirements
  • Task queues with priority
  • RPC over messaging
  • When you need per-message acknowledgment
  • Smaller scale than Kafka

Amazon SQS

Fully managed queue service.

  • Standard queue: at-least-once delivery, best-effort ordering, unlimited throughput
  • FIFO queue: exactly-once processing, strict ordering, 3,000 msg/s
  • Dead Letter Queue — automatic move after N failures
  • Visibility timeout — message invisible to others while being processed
  • Long polling — reduce empty responses

Comparison

FeatureKafkaRabbitMQSQS
ModelLog-based streamingTraditional brokerManaged queue
ThroughputVery high (millions/s)Medium (tens of K/s)High (unlimited*)
OrderingPer partitionPer queueFIFO or best-effort
RetentionConfigurable (days to forever)Until consumed14 days max
ReplayYes (seek to any offset)NoNo
RoutingPartition keyComplex exchange routingSimple queue
OperationsComplex (brokers, ZK)MediumZero (managed)
Use caseStreaming, pipelinesTask queues, routingSimple async tasks

Message Delivery Guarantees

GuaranteeMeaningHowTrade-off
At-most-onceMessage may be lost, never duplicatedFire and forget, no ackFastest, lossy
At-least-onceMessage never lost, may be duplicatedAck after processing, retry on failureMost common
Exactly-onceMessage processed exactly onceIdempotent processing + transactional commitsSlowest, most complex

Interview tip: "At-least-once delivery with idempotent consumers" is the standard answer. True exactly-once is expensive and rarely needed if consumers are idempotent.


Dead Letter Queues (DLQ)

Main Queue → Consumer → Fails 3 times → Dead Letter Queue
                                            ↓
                                     Manual inspection or
                                     automated retry later

Why DLQ matters:

  • Prevents poison messages from blocking the queue
  • Allows investigation of failures
  • Can set up alerts on DLQ growth
  • Enables manual or automated retry

Event-Driven Architecture

Event Notification

Order Service publishes: "OrderCreated"
  → Payment Service: charges card
  → Inventory Service: reserves stock
  → Email Service: sends confirmation

Producers don't know about consumers. Add new consumers without changing producers.

Event Sourcing

Instead of storing current state, store all events:

Events:
  1. AccountCreated { id: "123", name: "Zineddine" }
  2. FundsDeposited { id: "123", amount: 1000 }
  3. FundsWithdrawn { id: "123", amount: 200 }

Current state (derived): balance = 800

Pros: Complete audit trail, can rebuild state, temporal queries Cons: Complex queries, eventual consistency, storage growth

CQRS (Command Query Responsibility Segregation)

Commands (writes) → Write Model (normalized) → Events → Read Model (denormalized)
Queries (reads)   → Read Model (optimized for queries)

Why: Optimize read and write models independently When: Read and write patterns are very different (read-heavy with complex queries)


Backpressure

Problem

Producer is faster than consumer → queue grows unbounded → OOM

Solutions

  1. Bounded queue — reject/block when full
  2. Rate limiting — limit producer speed
  3. Consumer scaling — auto-scale consumers based on queue depth
  4. Sampling — drop or sample messages under load
  5. Kafka approach — consumer controls read speed (pull-based)

Interview Patterns

Async Processing

User → API → Queue → Worker → DB
           ↓
     Return 202 Accepted
     (provide status endpoint: GET /tasks/{id})

Fan-Out

User posts → Fan-out service → Write to all followers' feeds
           (use queue to handle millions of writes)

Saga Pattern (distributed transactions)

Order → Payment → Inventory → Shipping
  ↓ (if any fails)
Compensating transactions roll back previous steps

Resources


Previous: 07 - Databases Deep Dive | Next: 09 - Consistent Hashing & Data Partitioning