Load Balancing & Reverse Proxies

Why This Matters

Load balancing appears in virtually every system design. It's the first line of defense for scalability and availability.


What is Load Balancing?

Distributes incoming traffic across multiple servers to:

  • Increase throughput — handle more requests
  • Improve availability — if one server dies, others serve traffic
  • Reduce latency — route to least loaded or closest server
  • Enable horizontal scaling — add/remove servers dynamically

Where Load Balancers Sit

Internet → DNS LB → L4/L7 Load Balancer → Application Servers
                                          → Application Servers
                                          → Application Servers
                                                ↓
                                          Database LB → DB Replicas

Multiple layers of load balancing:

  1. DNS level — GeoDNS returns different IPs per region
  2. L4 (Transport) — Routes based on IP/port, fast, no content inspection
  3. L7 (Application) — Routes based on HTTP headers, URL, cookies — smarter

L4 vs L7 Load Balancing

AspectL4 (Transport)L7 (Application)
Operates onTCP/UDP packetsHTTP requests
Routing based onIP, portURL, headers, cookies, body
SpeedFaster (less processing)Slower (more processing)
SSL terminationNo (pass-through)Yes
Content awarenessNoneFull HTTP awareness
Use caseHigh-throughput TCPSmart HTTP routing
ExamplesAWS NLB, IPVSAWS ALB, Nginx, HAProxy

Load Balancing Algorithms

Static Algorithms (No server state)

Round Robin:

  • Requests go to servers in order: A → B → C → A → B → C
  • Simple but ignores server load/capacity
  • Good when all servers are identical

Weighted Round Robin:

  • Assign weights: A(3), B(2), C(1)
  • A gets 3x traffic of C
  • Good for heterogeneous servers

IP Hash:

  • Hash(client IP) → server
  • Same client always hits same server (sticky sessions)
  • Problem: adding/removing servers reshuffles all mappings

Dynamic Algorithms (Consider server state)

Least Connections:

  • Route to server with fewest active connections
  • Good for long-lived connections (WebSocket, database)
  • Most commonly used dynamic algorithm

Weighted Least Connections:

  • Least connections with server weights
  • Best of both worlds

Least Response Time:

  • Route to server with fastest response + fewest connections
  • Requires health monitoring
  • Best for latency-sensitive applications

Random:

  • Pick a random server
  • Surprisingly effective at large scale (power of random)
  • Used with "power of two choices" — pick 2 random, choose less loaded

Health Checks

Passive Health Checks

  • Monitor responses from normal traffic
  • Mark server unhealthy after N consecutive failures
  • No extra traffic overhead

Active Health Checks

  • LB sends periodic probes (HTTP GET /health)
  • More proactive failure detection
  • Adds slight overhead

Health Check Levels

L1: TCP connect (port open?)
L2: HTTP 200 (app responding?)
L3: Deep health (DB connected? Dependencies healthy?)

Best practice: /health returns 200 with basic checks, /health/deep checks dependencies.


Session Persistence (Sticky Sessions)

Problem

Stateful apps store session data in server memory. If LB routes request to a different server, session is lost.

Solutions (Worst to Best)

  1. Sticky sessions — LB routes same client to same server (cookie/IP based)
    • Con: uneven load, server failure loses sessions
  2. Session replication — copy session data across all servers
    • Con: memory overhead, complexity
  3. Centralized session store — Redis/Memcached holds all sessions
    • Pro: servers are stateless, any server can handle any request ✅
  4. Client-side sessions — JWT tokens contain session data
    • Pro: truly stateless servers
    • Con: can't revoke easily, payload size limits

Interview answer: "Prefer stateless servers with centralized session store (Redis)."


Reverse Proxy Features

Beyond load balancing, reverse proxies handle:

FeatureDescription
SSL/TLS terminationDecrypt HTTPS at proxy, forward HTTP internally
Compressiongzip/brotli response compression
CachingCache static assets and API responses
Rate limitingProtect backends from traffic spikes
Request routingRoute /api/* to API servers, /* to web servers
AuthenticationVerify JWT/OAuth before hitting backend
Circuit breakingStop sending traffic to failing backends
Request bufferingAbsorb slow clients, release backend quickly

Global Server Load Balancing (GSLB)

For multi-region deployments:

User in Paris → DNS → European LB → EU servers
User in NYC   → DNS → US East LB  → US servers
User in Tokyo → DNS → Asia LB     → Asia servers

Strategies:

  • GeoDNS — route based on client location
  • Latency-based — route to lowest-latency region
  • Failover — route to backup region if primary is down
  • Weighted — send X% to region A, Y% to region B (blue-green/canary)

Common Load Balancer Technologies

TechnologyTypeNotes
NginxL7Most popular reverse proxy/LB
HAProxyL4/L7High performance, battle-tested
EnvoyL7Service mesh sidecar, gRPC-native
TraefikL7Cloud-native, auto-discovery
AWS ALBL7Managed, integrates with AWS
AWS NLBL4Ultra-high performance TCP
AWS ELB ClassicL4/L7Legacy, avoid for new projects
Google Cloud LBL4/L7Global anycast LB

Redundancy of Load Balancers

The LB itself is a single point of failure!

Solutions:

  • Active-passive pair — heartbeat between primary and standby, failover via virtual IP (VRRP)
  • Active-active pair — both handle traffic, DNS returns both IPs
  • Managed cloud LB — AWS ALB/NLB, GCP LB — provider handles redundancy

Interview Tips

  • Always add a load balancer in your diagrams
  • Specify L4 vs L7 and justify why
  • Mention the algorithm (least connections for most cases)
  • Address LB as potential SPOF — mention redundancy
  • For global systems, discuss GSLB + GeoDNS

Resources


Previous: 04 - Data Storage Fundamentals | Next: 06 - Caching