Load Balancing & Reverse Proxies
Why This Matters
Load balancing appears in virtually every system design. It's the first line of defense for scalability and availability.
What is Load Balancing?
Distributes incoming traffic across multiple servers to:
- Increase throughput — handle more requests
- Improve availability — if one server dies, others serve traffic
- Reduce latency — route to least loaded or closest server
- Enable horizontal scaling — add/remove servers dynamically
Where Load Balancers Sit
Internet → DNS LB → L4/L7 Load Balancer → Application Servers
→ Application Servers
→ Application Servers
↓
Database LB → DB Replicas
Multiple layers of load balancing:
- DNS level — GeoDNS returns different IPs per region
- L4 (Transport) — Routes based on IP/port, fast, no content inspection
- L7 (Application) — Routes based on HTTP headers, URL, cookies — smarter
L4 vs L7 Load Balancing
| Aspect | L4 (Transport) | L7 (Application) |
|---|---|---|
| Operates on | TCP/UDP packets | HTTP requests |
| Routing based on | IP, port | URL, headers, cookies, body |
| Speed | Faster (less processing) | Slower (more processing) |
| SSL termination | No (pass-through) | Yes |
| Content awareness | None | Full HTTP awareness |
| Use case | High-throughput TCP | Smart HTTP routing |
| Examples | AWS NLB, IPVS | AWS ALB, Nginx, HAProxy |
Load Balancing Algorithms
Static Algorithms (No server state)
Round Robin:
- Requests go to servers in order: A → B → C → A → B → C
- Simple but ignores server load/capacity
- Good when all servers are identical
Weighted Round Robin:
- Assign weights: A(3), B(2), C(1)
- A gets 3x traffic of C
- Good for heterogeneous servers
IP Hash:
- Hash(client IP) → server
- Same client always hits same server (sticky sessions)
- Problem: adding/removing servers reshuffles all mappings
Dynamic Algorithms (Consider server state)
Least Connections:
- Route to server with fewest active connections
- Good for long-lived connections (WebSocket, database)
- Most commonly used dynamic algorithm
Weighted Least Connections:
- Least connections with server weights
- Best of both worlds
Least Response Time:
- Route to server with fastest response + fewest connections
- Requires health monitoring
- Best for latency-sensitive applications
Random:
- Pick a random server
- Surprisingly effective at large scale (power of random)
- Used with "power of two choices" — pick 2 random, choose less loaded
Health Checks
Passive Health Checks
- Monitor responses from normal traffic
- Mark server unhealthy after N consecutive failures
- No extra traffic overhead
Active Health Checks
- LB sends periodic probes (HTTP GET /health)
- More proactive failure detection
- Adds slight overhead
Health Check Levels
L1: TCP connect (port open?)
L2: HTTP 200 (app responding?)
L3: Deep health (DB connected? Dependencies healthy?)
Best practice: /health returns 200 with basic checks, /health/deep checks dependencies.
Session Persistence (Sticky Sessions)
Problem
Stateful apps store session data in server memory. If LB routes request to a different server, session is lost.
Solutions (Worst to Best)
- Sticky sessions — LB routes same client to same server (cookie/IP based)
- Con: uneven load, server failure loses sessions
- Session replication — copy session data across all servers
- Con: memory overhead, complexity
- Centralized session store — Redis/Memcached holds all sessions
- Pro: servers are stateless, any server can handle any request ✅
- Client-side sessions — JWT tokens contain session data
- Pro: truly stateless servers
- Con: can't revoke easily, payload size limits
Interview answer: "Prefer stateless servers with centralized session store (Redis)."
Reverse Proxy Features
Beyond load balancing, reverse proxies handle:
| Feature | Description |
|---|---|
| SSL/TLS termination | Decrypt HTTPS at proxy, forward HTTP internally |
| Compression | gzip/brotli response compression |
| Caching | Cache static assets and API responses |
| Rate limiting | Protect backends from traffic spikes |
| Request routing | Route /api/* to API servers, /* to web servers |
| Authentication | Verify JWT/OAuth before hitting backend |
| Circuit breaking | Stop sending traffic to failing backends |
| Request buffering | Absorb slow clients, release backend quickly |
Global Server Load Balancing (GSLB)
For multi-region deployments:
User in Paris → DNS → European LB → EU servers
User in NYC → DNS → US East LB → US servers
User in Tokyo → DNS → Asia LB → Asia servers
Strategies:
- GeoDNS — route based on client location
- Latency-based — route to lowest-latency region
- Failover — route to backup region if primary is down
- Weighted — send X% to region A, Y% to region B (blue-green/canary)
Common Load Balancer Technologies
| Technology | Type | Notes |
|---|---|---|
| Nginx | L7 | Most popular reverse proxy/LB |
| HAProxy | L4/L7 | High performance, battle-tested |
| Envoy | L7 | Service mesh sidecar, gRPC-native |
| Traefik | L7 | Cloud-native, auto-discovery |
| AWS ALB | L7 | Managed, integrates with AWS |
| AWS NLB | L4 | Ultra-high performance TCP |
| AWS ELB Classic | L4/L7 | Legacy, avoid for new projects |
| Google Cloud LB | L4/L7 | Global anycast LB |
Redundancy of Load Balancers
The LB itself is a single point of failure!
Solutions:
- Active-passive pair — heartbeat between primary and standby, failover via virtual IP (VRRP)
- Active-active pair — both handle traffic, DNS returns both IPs
- Managed cloud LB — AWS ALB/NLB, GCP LB — provider handles redundancy
Interview Tips
- Always add a load balancer in your diagrams
- Specify L4 vs L7 and justify why
- Mention the algorithm (least connections for most cases)
- Address LB as potential SPOF — mention redundancy
- For global systems, discuss GSLB + GeoDNS
Resources
- 📖 "System Design Interview" by Alex Xu — Chapter on Load Balancer
- 🔗 Nginx docs — Load Balancing
- 🎥 Gaurav Sen — Load Balancing
- 🔗 HAProxy Documentation
- 🔗 AWS ELB comparison
Previous: 04 - Data Storage Fundamentals | Next: 06 - Caching