Load Balancing & Reverse Proxies

Why This Matters

Load balancing appears in virtually every system design. It's the first line of defense for scalability and availability.

What is Load Balancing?

Distributes incoming traffic across multiple servers to:

Increase throughput — handle more requests
Improve availability — if one server dies, others serve traffic
Reduce latency — route to least loaded or closest server
Enable horizontal scaling — add/remove servers dynamically

Where Load Balancers Sit

Internet → DNS LB → L4/L7 Load Balancer → Application Servers
                                          → Application Servers
                                          → Application Servers
                                                ↓
                                          Database LB → DB Replicas

Multiple layers of load balancing:

DNS level — GeoDNS returns different IPs per region
L4 (Transport) — Routes based on IP/port, fast, no content inspection
L7 (Application) — Routes based on HTTP headers, URL, cookies — smarter

L4 vs L7 Load Balancing

Aspect	L4 (Transport)	L7 (Application)
Operates on	TCP/UDP packets	HTTP requests
Routing based on	IP, port	URL, headers, cookies, body
Speed	Faster (less processing)	Slower (more processing)
SSL termination	No (pass-through)	Yes
Content awareness	None	Full HTTP awareness
Use case	High-throughput TCP	Smart HTTP routing
Examples	AWS NLB, IPVS	AWS ALB, Nginx, HAProxy

Load Balancing Algorithms

Static Algorithms (No server state)

Round Robin:

Requests go to servers in order: A → B → C → A → B → C
Simple but ignores server load/capacity
Good when all servers are identical

Weighted Round Robin:

Assign weights: A(3), B(2), C(1)
A gets 3x traffic of C
Good for heterogeneous servers

IP Hash:

Hash(client IP) → server
Same client always hits same server (sticky sessions)
Problem: adding/removing servers reshuffles all mappings

Dynamic Algorithms (Consider server state)

Least Connections:

Route to server with fewest active connections
Good for long-lived connections (WebSocket, database)
Most commonly used dynamic algorithm

Weighted Least Connections:

Least connections with server weights
Best of both worlds

Least Response Time:

Route to server with fastest response + fewest connections
Requires health monitoring
Best for latency-sensitive applications

Random:

Pick a random server
Surprisingly effective at large scale (power of random)
Used with "power of two choices" — pick 2 random, choose less loaded

Health Checks

Passive Health Checks

Monitor responses from normal traffic
Mark server unhealthy after N consecutive failures
No extra traffic overhead

Active Health Checks

LB sends periodic probes (HTTP GET /health)
More proactive failure detection
Adds slight overhead

Health Check Levels

L1: TCP connect (port open?)
L2: HTTP 200 (app responding?)
L3: Deep health (DB connected? Dependencies healthy?)

Best practice: /health returns 200 with basic checks, /health/deep checks dependencies.

Session Persistence (Sticky Sessions)

Problem

Stateful apps store session data in server memory. If LB routes request to a different server, session is lost.

Solutions (Worst to Best)

Sticky sessions — LB routes same client to same server (cookie/IP based)
- Con: uneven load, server failure loses sessions
Session replication — copy session data across all servers
- Con: memory overhead, complexity
Centralized session store — Redis/Memcached holds all sessions
- Pro: servers are stateless, any server can handle any request ✅
Client-side sessions — JWT tokens contain session data
- Pro: truly stateless servers
- Con: can't revoke easily, payload size limits

Interview answer: "Prefer stateless servers with centralized session store (Redis)."

Reverse Proxy Features

Beyond load balancing, reverse proxies handle:

Feature	Description
SSL/TLS termination	Decrypt HTTPS at proxy, forward HTTP internally
Compression	gzip/brotli response compression
Caching	Cache static assets and API responses
Rate limiting	Protect backends from traffic spikes
Request routing	Route /api/* to API servers, /* to web servers
Authentication	Verify JWT/OAuth before hitting backend
Circuit breaking	Stop sending traffic to failing backends
Request buffering	Absorb slow clients, release backend quickly

Global Server Load Balancing (GSLB)

For multi-region deployments:

User in Paris → DNS → European LB → EU servers
User in NYC   → DNS → US East LB  → US servers
User in Tokyo → DNS → Asia LB     → Asia servers

Strategies:

GeoDNS — route based on client location
Latency-based — route to lowest-latency region
Failover — route to backup region if primary is down
Weighted — send X% to region A, Y% to region B (blue-green/canary)

Common Load Balancer Technologies

Technology	Type	Notes
Nginx	L7	Most popular reverse proxy/LB
HAProxy	L4/L7	High performance, battle-tested
Envoy	L7	Service mesh sidecar, gRPC-native
Traefik	L7	Cloud-native, auto-discovery
AWS ALB	L7	Managed, integrates with AWS
AWS NLB	L4	Ultra-high performance TCP
AWS ELB Classic	L4/L7	Legacy, avoid for new projects
Google Cloud LB	L4/L7	Global anycast LB

Redundancy of Load Balancers

The LB itself is a single point of failure!

Solutions:

Active-passive pair — heartbeat between primary and standby, failover via virtual IP (VRRP)
Active-active pair — both handle traffic, DNS returns both IPs
Managed cloud LB — AWS ALB/NLB, GCP LB — provider handles redundancy

Interview Tips

Always add a load balancer in your diagrams
Specify L4 vs L7 and justify why
Mention the algorithm (least connections for most cases)
Address LB as potential SPOF — mention redundancy
For global systems, discuss GSLB + GeoDNS

Resources

📖 "System Design Interview" by Alex Xu — Chapter on Load Balancer
🔗 Nginx docs — Load Balancing
🎥 Gaurav Sen — Load Balancing
🔗 HAProxy Documentation
🔗 AWS ELB comparison

Previous: 04 - Data Storage Fundamentals | Next: 06 - Caching