35 - Design Uber or Lyft
Series: System Design & Distributed Systems Previous: 34 - Design Google Search | Next: 36 - Design Twitter
1. Requirements
Functional Requirements
| Feature | Details |
|---|---|
| Request ride | Rider specifies pickup and destination |
| Driver matching | Match nearest available driver in real-time |
| Real-time tracking | Live location of driver on rider's map |
| Fare calculation | Estimate before trip, final fare after trip |
| Surge pricing | Dynamic pricing based on supply and demand |
| Trip history | Past trips for riders and drivers |
| Rating | Mutual rider/driver ratings after trip |
| Payments | Charge rider, pay driver |
Non-Functional Requirements
- Latency: Match driver within 10 seconds, location updates within 2 seconds
- Availability: 99.99% (riders stranded is a critical failure)
- Scale: 100M riders, 5M drivers, 20M trips/day
- Accuracy: GPS within 5-10 meters, ETA within 2 minutes
- Consistency: No double-booking of drivers
2. Capacity Estimation
Active drivers: 5M concurrent
Location updates: 5M drivers x 1 update/4s = 1.25M updates/sec
Rides/day: 20M
Ride requests/sec: ~230 QPS avg, ~700 peak
Trip tracking events: 20M rides x 300 updates/ride = 6B events/day
Trip data/day: 20M x 5KB = 100GB
| Resource | Estimate |
|---|---|
| Location update QPS | ~1.25M writes/sec |
| Matching QPS | ~700/sec peak |
| Trip events/day | ~6B |
| Location data/day | ~2TB (ephemeral, in-memory) |
3. High-Level Architecture
+--------+ +--------+
| Rider | | Driver |
| App | | App |
+---+----+ +---+----+
| |
v v
+---+-------------------+---+
| API Gateway |
| (Auth, Rate Limit) |
+---+------+------+---------+
| | |
v v v
+---+--+ +-+----+ +--+------+
| Ride | |Match | |Location |
| Svc | |Svc | |Service |
+--+---+ +--+---+ +--+------+
| | |
| | +----+-----+
| +--->| Geo Index |
| | (Redis |
| | Cluster) |
| +----+------+
| |
v v
+--+--------+ +----+----+
| Trip DB | | Kafka |
| (Postgres)| | (events)|
+-----------+ +----+----+
|
+------+-----+-----+------+
| | | |
+---+---+ +---+---+ +--+---+ +---+---+
|Tracking| | ETA | |Surge | |Analyt.|
|Service | |Service| |Svc | |Pipe |
+--------+ +-------+ +------+ +-------+
4. Geospatial Indexing
The core challenge: given a rider at (lat, lng), find the nearest available drivers among 5 million moving targets.
Option 1: Geohash
Geohash divides the world into a grid of cells encoded as strings.
Nearby locations share a common prefix.
(37.7749, -122.4194) --> "9q8yyk"
+------+------+------+------+
| 9q8y | 9q8z | 9q90 | 9q91 |
+------+------+------+------+
| 9q8v | 9q8w | 9q8x | 9q8y |
+------+------+------+------+
| 9q8s | 9q8t | 9q8u | 9q8v |
+------+------+------+------+
Precision levels:
4 chars: ~39km x 20km (city level)
5 chars: ~5km x 5km (neighborhood)
6 chars: ~1.2km x 0.6km (block level) <-- good for matching
Redis native support:
GEOADD drivers:available <lng> <lat> driver_123
GEORADIUS drivers:available <rider_lng> <rider_lat> 3 km ASC COUNT 10
Option 2: Quadtree
Recursively divide space into 4 quadrants.
Stop when cell has < K drivers (e.g., K=100).
World
/ \
/ \
NW NE SW SE
/ \ / \ / \ / \
... ...
Pros: Adaptive (dense areas = smaller cells)
Cons: Harder to distribute, tree rebalancing on driver movement
Option 3: S2 Cells (Google) / H3 (Uber)
S2 projects Earth onto a cube, uses Hilbert curve for cell IDs.
Level 12: ~3.3 km^2 cells (city blocks)
Level 14: ~0.8 km^2 cells (fine-grained)
Level 16: ~0.05 km^2 cells (building level)
H3 (Uber's variant): hexagonal cells
- Each hex has 6 equidistant neighbors (no "corner" problem)
- Hierarchical: coarser levels contain finer cells
- Resolution 9 --> ~0.1 km^2 (good for matching)
Comparison
| Method | Pros | Cons | Used By |
|---|---|---|---|
| Geohash | Simple, Redis-native GEORADIUS | Edge effects at cell boundaries | Lyft, many startups |
| Quadtree | Adapts to density | Complex to distribute | Early Uber |
| S2 Cells | Uniform, hierarchical, no edge effects | More complex | Google Maps |
| H3 | Hexagonal (6 equidistant neighbors) | Newer, less tooling | Uber |
The Boundary Problem
Rider R is at the edge of cell "9q8yyk":
+----------+----------+
| | |
| 9q8yyj | 9q8yym |
| R* | *D | <-- Driver D is close but in adjacent cell
|----------|----------|
| 9q8yy5 | 9q8yyk |
| | |
+----------+----------+
Solution: Always query the rider's cell + all 8 neighboring cells
neighbors("9q8yyk") = [9q8yyj, 9q8yym, 9q8yy5, 9q8yyh, ...]
Interview Tip: Geohash + Redis GEORADIUS is the simplest answer. Mention S2/H3 for depth. Always address the boundary problem.
5. Location Service
Driver Location Update Flow
Driver App Location Service Redis Geo Index
| | |
|-- GPS update --------->| |
| {lat, lng, heading, | |
| speed, timestamp} | |
| every 4 seconds | |
| |-- GEOADD to Redis ------->|
| | |
| |-- Publish to Kafka ------>|
| | (location-updates topic)|
| | |
| | Kafka consumers: |
| | - Trip Tracking |
| | - ETA Recalculation |
| | - Surge Computation |
| | - Analytics Pipeline |
Scale: 1.25M Writes/Sec
Solution: Redis Cluster geo-partitioned by city/region
redis-nyc: handles NYC drivers (~200K updates/sec)
redis-sf: handles SF drivers (~100K updates/sec)
redis-london: handles London drivers (~80K updates/sec)
...
Each cluster: 10-20 shards, each handling ~10K-20K writes/sec
Driver Location TTL
Each driver entry has a 30-second TTL in Redis.
If a driver stops sending updates (app crash, went offline):
- After 30s, entry expires automatically
- Driver disappears from matching pool
- No stale driver ghost problems
6. Matching Service
Matching Flow
Rider Request Matching Service
| |
|-- Request ride (pickup, |
| destination, ride_type) ------>|
| |
| 1. GEORADIUS: 10 nearest available drivers
| within expanding radius (1km -> 3km -> 5km)
| |
| 2. For each candidate:
| - Calculate ETA to pickup (routing service)
| - Check driver rating > threshold
| - Check acceptance rate
| - Check vehicle type matches
| |
| 3. Rank: lowest ETA + highest rating
| |
| 4. Send ride offer to top driver
| (push notification via WebSocket)
| |
| 5. Wait 15 seconds for response
| |
| [ACCEPT] | [DECLINE/TIMEOUT]
| | | |
| v | v
| Create trip | Try next driver
| | (up to 3 attempts)
| |
|<-- Match confirmed, ETA 4min ----|
Preventing Double-Booking
When driver accepts:
Redis transaction (WATCH + MULTI):
WATCH driver:D123:status
if status == "available":
MULTI
SET driver:D123:status "matched"
SET driver:D123:trip_id "T456"
EXEC
else:
DISCARD --> conflict, try next driver
This prevents two concurrent matches from assigning the same driver.
Advanced Matching: Batched Optimal Assignment
Instead of greedy matching (first-come-first-served):
Accumulate requests for 2-second windows:
Riders: [R1, R2, R3, R4, R5]
Drivers: [D1, D2, D3, D4, D5, D6]
Build cost matrix (ETA from each driver to each rider):
R1 R2 R3 R4 R5
D1 [ 3 7 2 9 5 ]
D2 [ 5 2 8 3 7 ]
D3 [ 8 4 6 1 3 ]
...
Solve assignment problem (Hungarian algorithm) to minimize total ETA.
Global optimum > local greedy.
7. Trip Lifecycle
State Machine
REQUESTED --> MATCHED --> DRIVER_EN_ROUTE --> ARRIVED --> IN_TRIP --> COMPLETED
| | | | |
v v v v v
CANCELLED CANCELLED CANCELLED CANCELLED CANCELLED
(by rider) (by rider (by rider or (by rider (emergency
or driver) driver, fee or driver) only)
may apply)
Trip Database Schema
sqltrips: trip_id UUID PRIMARY KEY rider_id UUID REFERENCES users driver_id UUID REFERENCES users status ENUM('requested','matched','driver_en_route', 'arrived','in_trip','completed','cancelled') pickup_location POINT dropoff_location POINT actual_route LINESTRING -- GPS waypoints recorded during trip request_time TIMESTAMP pickup_time TIMESTAMP dropoff_time TIMESTAMP distance_miles DECIMAL(8,2) duration_minutes DECIMAL(8,2) fare_cents INT surge_multiplier DECIMAL(3,2) payment_status ENUM('pending','charged','refunded') rider_rating SMALLINT -- 1-5 driver_rating SMALLINT -- 1-5
8. ETA Calculation
Approaches
| Method | Accuracy | Speed | Use Case |
|---|---|---|---|
| Haversine (straight-line) | Low | Instant | Rough estimate only |
| Road network + Dijkstra | Medium | ~100ms | Pre-trip estimate |
| Road network + real-time traffic | High | ~200ms | During-trip ETA |
| ML-based (historical patterns) | Highest | ~50ms (pre-computed) | Production systems |
Road Network Routing
Road Network as Weighted Graph:
Intersection A --[3 min at 8am, 1 min at 2am]--> Intersection B
Intersection B --[5 min at 8am, 2 min at 2am]--> Intersection C
Algorithms:
Dijkstra: O(V log V + E) -- exact, slower
A*: Faster with heuristic (haversine to destination)
Contraction Hierarchies: O(log V) queries (pre-computed shortcuts)
In Practice
- Use a routing engine (OSRM, Valhalla) or Google Maps Directions API
- Pre-compute cell-to-cell travel times for quick estimates
- Adjust with real-time traffic from driver GPS streams
- Uber's ETA system uses ML trained on billions of historical trips
Interview Tip: Don't design a routing engine. Say "we use OSRM or Google Maps API" and focus on the ride system architecture.
9. Surge Pricing
Supply-Demand Model
For each geographic cell (geohash level 6, ~1 km^2):
demand = ride requests in last 5 minutes
supply = available drivers in cell
surge_ratio = demand / supply
Pricing curve:
ratio <= 1.0 --> 1.0x (no surge)
ratio = 1.5 --> 1.3x
ratio = 2.0 --> 1.8x
ratio = 3.0 --> 2.5x
ratio >= 4.0 --> 3.0x (capped)
Surge Map
+------+------+------+------+
| 1.0x | 1.0x | 1.5x | 2.5x | <-- Stadium event ending
+------+------+------+------+
| 1.0x | 1.0x | 1.3x | 1.8x |
+------+------+------+------+
| 1.0x | 1.0x | 1.0x | 1.0x | <-- Normal demand
+------+------+------+------+
Effects
- Incentivizes drivers to move toward high-demand areas
- Reduces demand through price elasticity
- Computed every 1-2 minutes per cell
- Rider sees surge multiplier before confirming ride
10. Real-Time Tracking
Architecture
Driver App Kafka Tracking Service Rider App
| | | |
|-- GPS update --->| | |
| |-- consume event ---->| |
| | (for active trips) | |
| | |-- push via WS ----->|
| | | {lat, lng, ETA} |
| | | |
|-- GPS update --->| | |
| |-- consume event ---->| |
| | |-- push via WS ----->|
Client-Side Smoothing
Server sends update at t=0: driver at (40.712, -74.006)
Server sends update at t=4s: driver at (40.713, -74.005)
Client interpolates frames between updates:
t=0s: (40.712, -74.006)
t=1s: (40.7123, -74.0058) -- interpolated
t=2s: (40.7125, -74.0055) -- interpolated
t=3s: (40.7128, -74.0053) -- interpolated
t=4s: (40.713, -74.005) -- actual update
Result: smooth car animation on map instead of 4-second jumps
11. Payment Service
Flow
Trip Start --> Pre-authorize estimated fare on rider's card
|
Trip In Progress --> Track actual distance and time
|
Trip End --> Calculate final fare
|
v
+------+------+
| Charge rider |
| (capture) |
+------+------+
|
+------+------+
| Split: |
| Platform: 25%|
| Driver: 75% |
+------+------+
|
+------+------+
| Driver payout|
| (weekly |
| batch or |
| instant) |
+--------------+
Double-Entry Ledger
trip_completed (trip T456):
DEBIT rider_wallet $27.60
CREDIT platform_revenue $6.90 (25% commission)
CREDIT driver_earnings $20.70 (75% to driver)
12. Complete System Diagram
+--------+ +--------+
| Rider | WebSocket (tracking) | Driver |
| App |<------------------------------------->| App |
+---+----+ +---+----+
| |
| HTTP (ride requests) GPS every 4 sec |
| |
+---+------------------------------------------------+---+
| API Gateway |
| (Auth, Rate Limit, Route) |
+---+--------+--------+--------+---------+--------+------+
| | | | | |
v v v v v v
+---+--+ +--+---+ +--+----+ +-+-----+ +-+-----+ +--+----+
| Ride | |Match | |Locat. | |Trip | |Fare | |Payment|
| Req | |Svc | |Svc | |Svc | |Calc | |Svc |
| Svc | | | | | | | | | | |
+--+---+ +--+---+ +--+----+ +--+----+ +--+----+ +--+----+
| | | | | |
| | +----+----+ | | +----+------+
| +--->| Redis | | | | Payment |
| | Cluster | | | | Gateway |
| | (Geo | | | | (Stripe) |
| | Index) | | | +-----------+
| +----+----+ | |
| | | |
v v v v
+--+------------------+---------+---------+--+
| Kafka |
| (event streaming) |
+--+----------+----------+---------+---------+
| | | |
v v v v
+--+----+ +--+-----+ +--+----+ +--+------+
|Tracking| |Surge | |ETA | |Analytics|
|Service | |Pricing | |Service| |Pipeline |
|(WS push)| |Engine | |(OSRM) | |(Spark) |
+--------+ +--------+ +-------+ +---------+
+-------------------+ +-------------------+
| Trip DB | | Data Warehouse |
| (PostgreSQL) | | (BigQuery / S3) |
+-------------------+ +-------------------+
13. Trade-Off Summary
| Decision | Option A | Option B | Recommendation |
|---|---|---|---|
| Geo indexing | Geohash (Redis) | S2 / H3 cells | Geohash for MVP, H3 at Uber-scale |
| Location store | Redis only | Redis + Kafka | Redis + Kafka (real-time + downstream) |
| Matching | Greedy (nearest) | Batched optimal | Greedy with batch for high-demand zones |
| Tracking | HTTP polling | WebSocket push | WebSocket (lower latency, lower overhead) |
| ETA | Haversine | Road network routing | Road network (far more accurate) |
| Surge | Fixed tiers | Continuous curve | Continuous with caps (smoother UX) |
| Payments | Pay-per-trip | Pre-authorize + capture | Pre-authorize (handles fare changes) |
14. Interview Tips
- Geospatial indexing is THE key topic: Explain geohash clearly, mention S2/H3 for depth
- Location updates are the hottest path: 1.25M writes/sec needs Redis cluster sharded by region
- Matching is not just nearest driver: Mention ETA, driver rating, acceptance rate, vehicle type
- Double-booking prevention: Show you understand distributed locking / optimistic concurrency
- Surge pricing: Simple concept, explain supply-demand per geographic cell
- Trip state machine: Walk through the full lifecycle -- shows you think about edge cases
- Boundary problem: Always mention that geohash queries must include adjacent cells
- Don't over-design routing: Say "use OSRM or Google Maps API" and move on
15. Resources
- Alex Xu - System Design Interview Vol. 2, Chapter 1: Proximity Service
- Uber Engineering Blog: "Uber's Big Data Platform" and "Geospatial Indexing"
- Uber H3: Hexagonal hierarchical geospatial indexing system (h3geo.org)
- Lyft Engineering: "How Lyft Discovers Drivers for Riders"
- Google S2 Geometry Library documentation
- OSRM (Open Source Routing Machine): project-osrm.org
- "Scaling Uber's Real-time Market Platform" (InfoQ talk)
- Martin Kleppmann - Designing Data-Intensive Applications, Chapter 2 (Data Models)
Previous: 34 - Design Google Search | Next: 36 - Design Twitter