Microservices Architecture

Why This Matters

Every FAANG company runs microservices at scale. You'll be expected to decompose systems into services, handle inter-service communication, and understand the trade-offs.

Monolith vs Microservices

Monolith

[Single deployable unit]
├── User module
├── Order module
├── Payment module
├── Notification module
└── Shared database

Pros: Simple deployment, no network overhead, easy debugging, ACID transactions Cons: Scales as one unit, long deployment cycles, team bottlenecks, technology lock-in

Microservices

[User Service]      [Order Service]      [Payment Service]
  ↕ own DB            ↕ own DB             ↕ own DB
       ↕                   ↕                    ↕
  [Message Bus / API Gateway / Service Mesh]

Pros: Independent deployment, team autonomy, technology diversity, targeted scaling Cons: Network complexity, distributed transactions, operational overhead, debugging is harder

When to Start with Microservices?

Don't. Start monolith, extract services when you have clear domain boundaries
Extract when: team size grows, deployment conflicts increase, different scaling needs emerge
"Monolith first" — Martin Fowler

Service Decomposition

Domain-Driven Design (DDD)

Decompose by bounded contexts — areas of the business with clear boundaries.

E-commerce:
  Bounded Context: Catalog (products, categories, search)
  Bounded Context: Orders (cart, checkout, order history)
  Bounded Context: Payments (charges, refunds, invoices)
  Bounded Context: Shipping (tracking, carriers, addresses)
  Bounded Context: Users (auth, profiles, preferences)

Each bounded context becomes a service (or group of services).

Single Responsibility

Each service should:

Own one business capability
Own its data (no shared databases!)
Be deployable independently
Be owned by one team

Size Guidelines

"Can be rewritten in 2 weeks" (Amazon guideline)
"Two-pizza team" can own it (6-10 people)
If it needs constant coordination with another service → maybe merge them

API Gateway

Client → API Gateway → User Service
                     → Order Service
                     → Payment Service

Responsibilities

Feature	Description
Routing	Route /users/* to User Service, /orders/* to Order Service
Authentication	Verify JWT/OAuth tokens before forwarding
Rate limiting	Protect backends from abuse
Load balancing	Distribute across service instances
Response aggregation	Combine multiple service responses
Protocol translation	REST ↔ gRPC, HTTP ↔ WebSocket
Caching	Cache GET responses
Circuit breaking	Stop forwarding to failing services

BFF (Backend for Frontend)

Mobile App → Mobile BFF → Services (optimized payloads)
Web App    → Web BFF    → Services (different data needs)

Each frontend gets a dedicated API gateway optimized for its needs.

Technologies

Kong, AWS API Gateway, Envoy, Traefik, Netflix Zuul, NGINX

Service Mesh

What Is It?

Infrastructure layer that handles service-to-service communication via sidecar proxies.

[Service A] ↔ [Sidecar Proxy A] ↔ [Sidecar Proxy B] ↔ [Service B]
                     ↕
              [Control Plane]

What It Handles (So Your Code Doesn't Have To)

Service discovery — find other services
Load balancing — distribute traffic
Encryption — mTLS between services
Observability — metrics, traces, logs
Circuit breaking — stop cascading failures
Retry & timeout — automatic retry logic
Traffic splitting — canary, A/B testing

Technologies

Istio (most popular, complex)
Linkerd (simpler, lighter)
Envoy (proxy used by both Istio and others)
AWS App Mesh

When to Use Service Mesh

Large number of services (50+)
Need consistent observability/security across services
Multiple teams, want to standardize communication patterns
Don't use for small systems (overhead not worth it)

Service Discovery

Problem

Services are dynamic — IPs change, instances scale up/down. How does Service A find Service B?

Client-Side Discovery

Service A → Service Registry → "Service B is at 10.0.1.5:8080, 10.0.1.6:8080"
Service A → Load balance locally → 10.0.1.5:8080

Client queries registry, does its own load balancing
Used by: Netflix Eureka + Ribbon

Server-Side Discovery

Service A → Load Balancer → Service B (LB knows instances)
                           (LB queries registry)

Client sends to LB, LB routes to correct instance
Used by: AWS ALB + ECS, Kubernetes Services

Service Registries

etcd — distributed KV store (Kubernetes uses it)
Consul — service discovery + health checks + KV
ZooKeeper — coordination service (older)
Kubernetes DNS — built-in service discovery via DNS

Inter-Service Communication Patterns

Synchronous

Order Service --HTTP/gRPC-→ Payment Service
               (waits for response)

Simple, intuitive
Creates coupling and latency chain
Use for: queries, simple request-response

Asynchronous (Event-Driven)

Order Service --event-→ Message Broker --event-→ Payment Service
               (fire and forget)               (processes when ready)

Decoupled, resilient, scalable
Harder to debug, eventual consistency
Use for: commands, notifications, data sync

Request-Reply over Messages

Order Service → Request Queue → Payment Service
Payment Service → Reply Queue → Order Service

Async but with response
Correlation ID links request to reply

Data Management in Microservices

Database per Service (Critical!)

User Service → User DB (PostgreSQL)
Order Service → Order DB (MySQL)
Search Service → Search Index (Elasticsearch)
Analytics Service → Analytics DB (ClickHouse)

Never share databases between services. This creates coupling.

Data Consistency

Use Saga pattern for distributed transactions (see 13 - Distributed Transactions)
Accept eventual consistency where possible
Use events to propagate changes between services

CQRS for Complex Reads

Commands → Write Service → Write DB → Events → Read Service → Read DB (denormalized)
Queries  → Read Service  → Read DB (optimized for queries)

Common Anti-Patterns

Anti-Pattern	Problem	Fix
Distributed monolith	Services tightly coupled, must deploy together	Proper bounded contexts, async communication
Shared database	Any service can read/write any table	Database per service
Chatty services	Too many inter-service calls per request	Aggregate APIs, BFF, caching
Mega service	One service does too much	Decompose by domain
Nano services	Too many tiny services	Merge related services
No API versioning	Breaking changes cascade	Version APIs, backwards compatibility

Observability in Microservices

The Three Pillars

Logs — structured logging with correlation IDs
Metrics — request rate, error rate, latency (RED method)
Traces — distributed tracing across services (Jaeger, Zipkin)

Correlation ID

Client → API Gateway (generates correlation-id: abc-123)
  → Service A (logs with correlation-id: abc-123)
    → Service B (logs with correlation-id: abc-123)
      → Service C (logs with correlation-id: abc-123)

One ID traces a request across all services.

Resources

📖 "Building Microservices" by Sam Newman
📖 "Microservices Patterns" by Chris Richardson
📖 DDIA Chapter 4: Encoding and Evolution
🔗 microservices.io — patterns catalog
🎥 Martin Fowler — Microservices
🔗 Netflix Tech Blog — Microservices at Netflix

Previous: 15 - Scaling Strategies | Next: 17 - Rate Limiting & Throttling