Previous: 44 - Gossip Protocols & Membership | Next: 46 - ML System Design Basics
1. Why Go Multi-Region?
Driver
Explanation
Latency
Users in Tokyo shouldn't wait 200ms for a server in Virginia. Serve from the nearest region.
Availability
A single region outage (AWS us-east-1 goes down) shouldn't take the whole system offline.
Compliance
GDPR requires EU user data to stay in EU. Data sovereignty laws vary by country.
Disaster recovery
Natural disasters, power outages, fiber cuts can take out an entire region.
Scalability
Distribute load across regions rather than vertically scaling a single region.
2. Active-Passive vs Active-Active
Active-Passive (Primary-Secondary)
Users
/ | \
US users EU users APAC users
| | |
v v v
+----------+ Global LB routes all writes
| | to primary region
v v
+--------------+ +--------------+
| US-EAST | | EU-WEST |
| (PRIMARY) | | (STANDBY) |
| Read + Write | | Read only |
+---------+----+ +------+-------+
| ^
| async |
| replication |
+--------------+
Aspect
Detail
Writes
All go to primary region
Reads
Can serve from any region
Failover
Promote standby to primary (minutes of downtime)
Complexity
Lower
Data loss risk
RPO > 0 (async replication lag)
Active-Active (Multi-Primary)
Users
/ | \
US users EU users APAC users
| | |
v v v
+-----------+ +-----------+ +-----------+
| US-EAST | | EU-WEST | | APAC-EAST |
| Read+Write| | Read+Write| | Read+Write|
+-----+-----+ +-----+-----+ +-----+-----+
| | |
+--------------+--------------+
bi-directional async
replication
Aspect
Detail
Writes
Any region can accept writes
Reads
Served locally (lowest latency)
Failover
Other regions absorb traffic (near-zero downtime)
Complexity
High (conflict resolution required)
Consistency
Eventual (conflicts possible)
Comparison
Criterion
Active-Passive
Active-Active
Write latency
High for remote users
Low (local writes)
Failover time
Minutes
Seconds
Conflict resolution
None needed
Required (CRDTs, LWW, app-level)
Cost
Lower (standby underutilized)
Higher (all regions active)
Data consistency
Easier (single writer)
Harder (multi-writer)
Interview Tip
Most companies start active-passive and evolve to active-active only for specific high-traffic, latency-sensitive workloads. Full active-active is expensive and complex. Be ready to justify which approach fits the use case.
3. Data Replication Across Regions
Synchronous vs Asynchronous
Synchronous:
US-EAST writes data
|--- replicate to EU-WEST (wait for ACK) --- 80-150ms cross-region
|--- replicate to APAC (wait for ACK) ------- 150-300ms cross-region
|--- respond to client
Total write latency: 150-300ms+ (bottleneck: farthest region)
Asynchronous:
US-EAST writes data
|--- respond to client immediately
|--- replicate to EU-WEST in background
|--- replicate to APAC in background
Total write latency: ~5ms (local only)
Trade-off: data may be lost if region fails before replication
Approach
Write Latency
Consistency
Data Loss Risk
Synchronous
High (cross-region RTT)
Strong
None
Asynchronous
Low (local only)
Eventual
Window of replication lag
Semi-synchronous
Medium (1 remote ACK)
Bounded staleness
Reduced
Cross-Region Latency Reference
Route
Typical RTT
US-East <-> US-West
60-80ms
US-East <-> EU-West
80-120ms
US-East <-> APAC
150-250ms
EU-West <-> APAC
200-300ms
Same region (AZ to AZ)
1-2ms
4. Conflict Resolution for Multi-Region Writes
When two regions write to the same record concurrently:
US-EAST: UPDATE user SET name = "Alice" (timestamp: t1)
EU-WEST: UPDATE user SET name = "Bob" (timestamp: t2)
Both succeed locally. On replication, conflict detected.
Strategy
How
Trade-off
Last-Writer-Wins (LWW)
Higher timestamp wins
Simple but data loss (losing write silently discarded)
CRDTs
Merge-friendly data structures
No data loss, limited data type support (see 43 - CRDT & Conflict-Free Replication)
Application-level
App defines custom merge logic
Most flexible, most complex
Conflict-free by design
Partition data so each region owns a subset
No conflicts possible, limits flexibility
Conflict-Free by Design (Recommended Starting Point)
Strategy: Route writes to the "home region" of the data
User created in EU -> EU-WEST is the home region for this user's data
All writes to this user go to EU-WEST
Other regions serve reads from local replica
Routing table:
user_id % N -> home_region
OR: user's registration country -> home_region
This avoids multi-writer conflicts entirely.
DNS returns IP of nearest region based on client IP
Good
Slow (DNS TTL, 30-300s)
Anycast
Same IP announced from all regions, BGP routes to nearest
Best
Fast (BGP convergence, seconds)
Global LB (L7)
Single entry point, intelligent routing (e.g., Cloudflare, AWS Global Accelerator)
Good
Fast (health-check based)
Client-side routing
Client SDK picks region based on latency probes
Best (measured)
Immediate
6. Data Placement & Sovereignty
GDPR and Data Sovereignty
GDPR Rule: EU personal data must be processed in compliance with EU law.
In practice: store EU user data in EU regions.
Implementation:
User registers from Germany
-> user.home_region = EU-WEST
-> PII stored only in EU-WEST
-> Aggregated/anonymized data may be replicated globally
Data classification:
PII (name, email, address) -> restricted to home region
Non-PII (product views, clicks) -> can replicate globally
Aggregated metrics -> can replicate globally
Data Type
Can Replicate Globally?
Notes
PII (personal data)
No (unless compliant transfer mechanism)
GDPR, CCPA, LGPD
Financial data
Jurisdiction-dependent
PCI DSS, local finance laws
Healthcare data
No
HIPAA (US), local equivalents
Anonymized/aggregated
Yes
No longer personal data
Application config
Yes
Non-sensitive
7. Designing for Regional Failure
Region US-EAST goes down:
Before failure:
US users -> US-EAST
EU users -> EU-WEST
APAC users -> APAC-EAST
After failure:
US users -> EU-WEST (failover, +80ms latency)
EU users -> EU-WEST (unchanged)
APAC users -> APAC-EAST (unchanged)
Requirements for successful failover:
1. Health checks detect US-EAST failure (seconds)
2. DNS/LB routes US traffic to EU-WEST (seconds to minutes)
3. EU-WEST has capacity to handle US + EU traffic
4. EU-WEST has sufficiently recent data replica
5. Write path redirected or queued
Capacity Planning for Failover
Rule of thumb: Each region should have 50% headroom
Normal load per region: 1000 QPS
Capacity per region: 1500 QPS (50% headroom)
When one of 3 regions fails:
Surviving regions each absorb ~500 extra QPS
New load: 1500 QPS each (at capacity, but functional)
Alternative: N+1 regions
3 regions needed for normal traffic
Deploy 4 regions (1 extra for failover capacity)
8. Traffic Management During Failover
Failover Timeline:
t=0: US-EAST health checks fail
t=5s: Health check confirms failure (3 consecutive failures)
t=10s: Global LB removes US-EAST from rotation
t=10s: US traffic routes to EU-WEST (nearest healthy)
t=10s+: EU-WEST auto-scales to handle increased load
DNS-based failover:
Slower: DNS TTL must expire (30-300s)
Mitigation: Use low TTL (30s) for critical services
Or: Use anycast/global LB instead of DNS
Client-side failover:
Fastest: Client detects failure, switches endpoint
Requires: Client SDK with failover logic
9. Consistency Trade-offs in Multi-Region
The spectrum:
Strong Consistency Eventual Consistency
| |
Spanner (sync cross-region) Bounded staleness DynamoDB Global Tables
Slow writes, no conflicts Tunable Fast writes, conflicts possible
Model
Write Latency
Read Freshness
Example
Strong (synchronous)
100-300ms
Always fresh
Google Spanner
Bounded staleness
50-100ms
Stale by at most X seconds
CockroachDB (follower reads)
Causal consistency
10-50ms
Causally related reads are fresh
MongoDB sessions
Eventual
1-10ms
May read stale data
DynamoDB Global Tables
Interview Tip
Google Spanner achieves strong consistency across regions using TrueTime (GPS + atomic clocks for globally synchronized timestamps). This is hardware-dependent and unique to Google's infrastructure. CockroachDB approximates this with software-based uncertainty intervals. Most systems use eventual consistency because cross-region synchronous writes are too slow.
10. Cost Considerations
Cost Factor
Impact
Data transfer
Cross-region bandwidth: $0.02-0.09/GB (AWS). Replication generates significant transfer costs.
Compute
Running full application stack in 3+ regions: 3x compute cost.
1. Replicate only what's needed
- Full DB replica: expensive
- Cache popular data in remote regions: cheaper
- CDN for static content: cheapest
2. Tiered approach
- Primary region: full stack
- Secondary regions: read replicas + cache + CDN
- Failover region: minimal standby, scale on demand
3. Reserved capacity in primary, spot/on-demand in secondary
4. Compress replication traffic (50-80% reduction)
11. Real-World Examples
Google Spanner
- Globally distributed, strongly consistent
- Uses TrueTime (GPS + atomic clocks) for global ordering
- Synchronous replication across regions
- Write latency: 10-100ms (within continent), 100-300ms (cross-continent)
- Used for: Google Ads, Google Play
- Externally consistent: strongest possible guarantee
CockroachDB
- Open-source Spanner-inspired database
- Software-based uncertainty intervals (no special hardware)
- Serializable isolation across regions
- Survival goals: region-level failure tolerance
- Follower reads: bounded-staleness reads from any region
DynamoDB Global Tables
- Active-active multi-region
- Eventual consistency for cross-region reads
- LWW conflict resolution (last writer wins)
- Replication lag: typically < 1 second
- Simple setup: enable Global Tables, select regions
- Trade-off: no strong consistency across regions
Netflix
- Active-active across 3 AWS regions (us-east, us-west, eu-west)
- Each region serves its local traffic
- Regional failure: traffic drains to other regions
- Zuul (API gateway) handles routing
- EVCache (Memcached-based) replicated across regions
- Cassandra for multi-region data with eventual consistency
12. Architecture Pattern: Multi-Region with Regional Data Ownership
+-------------------+
| Global DNS / |
| GeoDNS (Route53) |
+--------+----------+
|
+-----------------+-----------------+
| | |
+-------v-------+ +------v------+ +--------v------+
| US-EAST | | EU-WEST | | APAC-EAST |
+---------------+ +-------------+ +---------------+
| API Gateway | | API Gateway | | API Gateway |
| App Servers | | App Servers | | App Servers |
| Cache (Redis) | | Cache | | Cache |
| DB Primary | | DB Primary | | DB Primary |
| (US users) | | (EU users) | | (APAC users) |
| DB Replica | | DB Replica | | DB Replica |
| (EU, APAC) | | (US, APAC) | | (US, EU) |
+-------+-------+ +------+------+ +-------+------+
| | |
+----------------+-----------------+
Cross-region async replication
(replicas are read-only)
Write path: user's home region (strong consistency)
Read path: local region replica (eventual consistency)
Failover: promote replica to primary in surviving region
13. Key Trade-offs Discussion
Decision
Option A
Option B
Topology
Active-passive (simpler)
Active-active (lower latency, costlier)
Replication
Sync (strong, slow)
Async (fast, risk data loss)
Conflict resolution
LWW (simple, data loss)
CRDTs or app-level (complex, no data loss)
Data placement
Full replication (simple)
Partitioned by region (compliant, complex routing)