Microservices Architecture
Why This Matters
Every FAANG company runs microservices at scale. You'll be expected to decompose systems into services, handle inter-service communication, and understand the trade-offs.
Monolith vs Microservices
Monolith
[Single deployable unit]
├── User module
├── Order module
├── Payment module
├── Notification module
└── Shared database
Pros: Simple deployment, no network overhead, easy debugging, ACID transactions Cons: Scales as one unit, long deployment cycles, team bottlenecks, technology lock-in
Microservices
[User Service] [Order Service] [Payment Service]
↕ own DB ↕ own DB ↕ own DB
↕ ↕ ↕
[Message Bus / API Gateway / Service Mesh]
Pros: Independent deployment, team autonomy, technology diversity, targeted scaling Cons: Network complexity, distributed transactions, operational overhead, debugging is harder
When to Start with Microservices?
- Don't. Start monolith, extract services when you have clear domain boundaries
- Extract when: team size grows, deployment conflicts increase, different scaling needs emerge
- "Monolith first" — Martin Fowler
Service Decomposition
Domain-Driven Design (DDD)
Decompose by bounded contexts — areas of the business with clear boundaries.
E-commerce:
Bounded Context: Catalog (products, categories, search)
Bounded Context: Orders (cart, checkout, order history)
Bounded Context: Payments (charges, refunds, invoices)
Bounded Context: Shipping (tracking, carriers, addresses)
Bounded Context: Users (auth, profiles, preferences)
Each bounded context becomes a service (or group of services).
Single Responsibility
Each service should:
- Own one business capability
- Own its data (no shared databases!)
- Be deployable independently
- Be owned by one team
Size Guidelines
- "Can be rewritten in 2 weeks" (Amazon guideline)
- "Two-pizza team" can own it (6-10 people)
- If it needs constant coordination with another service → maybe merge them
API Gateway
Client → API Gateway → User Service
→ Order Service
→ Payment Service
Responsibilities
| Feature | Description |
|---|---|
| Routing | Route /users/* to User Service, /orders/* to Order Service |
| Authentication | Verify JWT/OAuth tokens before forwarding |
| Rate limiting | Protect backends from abuse |
| Load balancing | Distribute across service instances |
| Response aggregation | Combine multiple service responses |
| Protocol translation | REST ↔ gRPC, HTTP ↔ WebSocket |
| Caching | Cache GET responses |
| Circuit breaking | Stop forwarding to failing services |
BFF (Backend for Frontend)
Mobile App → Mobile BFF → Services (optimized payloads)
Web App → Web BFF → Services (different data needs)
Each frontend gets a dedicated API gateway optimized for its needs.
Technologies
- Kong, AWS API Gateway, Envoy, Traefik, Netflix Zuul, NGINX
Service Mesh
What Is It?
Infrastructure layer that handles service-to-service communication via sidecar proxies.
[Service A] ↔ [Sidecar Proxy A] ↔ [Sidecar Proxy B] ↔ [Service B]
↕
[Control Plane]
What It Handles (So Your Code Doesn't Have To)
- Service discovery — find other services
- Load balancing — distribute traffic
- Encryption — mTLS between services
- Observability — metrics, traces, logs
- Circuit breaking — stop cascading failures
- Retry & timeout — automatic retry logic
- Traffic splitting — canary, A/B testing
Technologies
- Istio (most popular, complex)
- Linkerd (simpler, lighter)
- Envoy (proxy used by both Istio and others)
- AWS App Mesh
When to Use Service Mesh
- Large number of services (50+)
- Need consistent observability/security across services
- Multiple teams, want to standardize communication patterns
- Don't use for small systems (overhead not worth it)
Service Discovery
Problem
Services are dynamic — IPs change, instances scale up/down. How does Service A find Service B?
Client-Side Discovery
Service A → Service Registry → "Service B is at 10.0.1.5:8080, 10.0.1.6:8080"
Service A → Load balance locally → 10.0.1.5:8080
- Client queries registry, does its own load balancing
- Used by: Netflix Eureka + Ribbon
Server-Side Discovery
Service A → Load Balancer → Service B (LB knows instances)
(LB queries registry)
- Client sends to LB, LB routes to correct instance
- Used by: AWS ALB + ECS, Kubernetes Services
Service Registries
- etcd — distributed KV store (Kubernetes uses it)
- Consul — service discovery + health checks + KV
- ZooKeeper — coordination service (older)
- Kubernetes DNS — built-in service discovery via DNS
Inter-Service Communication Patterns
Synchronous
Order Service --HTTP/gRPC-→ Payment Service
(waits for response)
- Simple, intuitive
- Creates coupling and latency chain
- Use for: queries, simple request-response
Asynchronous (Event-Driven)
Order Service --event-→ Message Broker --event-→ Payment Service
(fire and forget) (processes when ready)
- Decoupled, resilient, scalable
- Harder to debug, eventual consistency
- Use for: commands, notifications, data sync
Request-Reply over Messages
Order Service → Request Queue → Payment Service
Payment Service → Reply Queue → Order Service
- Async but with response
- Correlation ID links request to reply
Data Management in Microservices
Database per Service (Critical!)
User Service → User DB (PostgreSQL)
Order Service → Order DB (MySQL)
Search Service → Search Index (Elasticsearch)
Analytics Service → Analytics DB (ClickHouse)
Never share databases between services. This creates coupling.
Data Consistency
- Use Saga pattern for distributed transactions (see 13 - Distributed Transactions)
- Accept eventual consistency where possible
- Use events to propagate changes between services
CQRS for Complex Reads
Commands → Write Service → Write DB → Events → Read Service → Read DB (denormalized)
Queries → Read Service → Read DB (optimized for queries)
Common Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Distributed monolith | Services tightly coupled, must deploy together | Proper bounded contexts, async communication |
| Shared database | Any service can read/write any table | Database per service |
| Chatty services | Too many inter-service calls per request | Aggregate APIs, BFF, caching |
| Mega service | One service does too much | Decompose by domain |
| Nano services | Too many tiny services | Merge related services |
| No API versioning | Breaking changes cascade | Version APIs, backwards compatibility |
Observability in Microservices
The Three Pillars
- Logs — structured logging with correlation IDs
- Metrics — request rate, error rate, latency (RED method)
- Traces — distributed tracing across services (Jaeger, Zipkin)
Correlation ID
Client → API Gateway (generates correlation-id: abc-123)
→ Service A (logs with correlation-id: abc-123)
→ Service B (logs with correlation-id: abc-123)
→ Service C (logs with correlation-id: abc-123)
One ID traces a request across all services.
Resources
- 📖 "Building Microservices" by Sam Newman
- 📖 "Microservices Patterns" by Chris Richardson
- 📖 DDIA Chapter 4: Encoding and Evolution
- 🔗 microservices.io — patterns catalog
- 🎥 Martin Fowler — Microservices
- 🔗 Netflix Tech Blog — Microservices at Netflix
Previous: 15 - Scaling Strategies | Next: 17 - Rate Limiting & Throttling