35 - Design Uber or Lyft

Series: System Design & Distributed Systems Previous: 34 - Design Google Search | Next: 36 - Design Twitter

1. Requirements

Functional Requirements

Feature	Details
Request ride	Rider specifies pickup and destination
Driver matching	Match nearest available driver in real-time
Real-time tracking	Live location of driver on rider's map
Fare calculation	Estimate before trip, final fare after trip
Surge pricing	Dynamic pricing based on supply and demand
Trip history	Past trips for riders and drivers
Rating	Mutual rider/driver ratings after trip
Payments	Charge rider, pay driver

Non-Functional Requirements

Latency: Match driver within 10 seconds, location updates within 2 seconds
Availability: 99.99% (riders stranded is a critical failure)
Scale: 100M riders, 5M drivers, 20M trips/day
Accuracy: GPS within 5-10 meters, ETA within 2 minutes
Consistency: No double-booking of drivers

2. Capacity Estimation

Active drivers:         5M concurrent
Location updates:       5M drivers x 1 update/4s = 1.25M updates/sec
Rides/day:              20M
Ride requests/sec:      ~230 QPS avg, ~700 peak
Trip tracking events:   20M rides x 300 updates/ride = 6B events/day
Trip data/day:          20M x 5KB = 100GB

Resource	Estimate
Location update QPS	~1.25M writes/sec
Matching QPS	~700/sec peak
Trip events/day	~6B
Location data/day	~2TB (ephemeral, in-memory)

3. High-Level Architecture

+--------+         +--------+
| Rider  |         | Driver |
| App    |         | App    |
+---+----+         +---+----+
    |                   |
    v                   v
+---+-------------------+---+
|       API Gateway         |
|   (Auth, Rate Limit)      |
+---+------+------+---------+
    |      |      |
    v      v      v
+---+--+ +-+----+ +--+------+
| Ride | |Match | |Location |
| Svc  | |Svc   | |Service  |
+--+---+ +--+---+ +--+------+
   |        |         |
   |        |    +----+-----+
   |        +--->| Geo Index |
   |             | (Redis    |
   |             |  Cluster) |
   |             +----+------+
   |                  |
   v                  v
+--+--------+    +----+----+
| Trip DB   |    | Kafka   |
| (Postgres)|    | (events)|
+-----------+    +----+----+
                      |
         +------+-----+-----+------+
         |      |            |      |
     +---+---+ +---+---+ +--+---+ +---+---+
     |Tracking| |  ETA  | |Surge | |Analyt.|
     |Service | |Service| |Svc   | |Pipe   |
     +--------+ +-------+ +------+ +-------+

4. Geospatial Indexing

The core challenge: given a rider at (lat, lng), find the nearest available drivers among 5 million moving targets.

Option 1: Geohash

Geohash divides the world into a grid of cells encoded as strings.
Nearby locations share a common prefix.

(37.7749, -122.4194) --> "9q8yyk"

+------+------+------+------+
| 9q8y | 9q8z | 9q90 | 9q91 |
+------+------+------+------+
| 9q8v | 9q8w | 9q8x | 9q8y |
+------+------+------+------+
| 9q8s | 9q8t | 9q8u | 9q8v |
+------+------+------+------+

Precision levels:
  4 chars: ~39km x 20km  (city level)
  5 chars: ~5km x 5km    (neighborhood)
  6 chars: ~1.2km x 0.6km (block level) <-- good for matching

Redis native support:
  GEOADD drivers:available <lng> <lat> driver_123
  GEORADIUS drivers:available <rider_lng> <rider_lat> 3 km ASC COUNT 10

Option 2: Quadtree

Recursively divide space into 4 quadrants.
Stop when cell has < K drivers (e.g., K=100).

          World
         /    \
       /        \
     NW    NE    SW    SE
    / \   / \   / \   / \
   ...              ...

Pros: Adaptive (dense areas = smaller cells)
Cons: Harder to distribute, tree rebalancing on driver movement

Option 3: S2 Cells (Google) / H3 (Uber)

S2 projects Earth onto a cube, uses Hilbert curve for cell IDs.

Level 12: ~3.3 km^2 cells  (city blocks)
Level 14: ~0.8 km^2 cells  (fine-grained)
Level 16: ~0.05 km^2 cells (building level)

H3 (Uber's variant): hexagonal cells
  - Each hex has 6 equidistant neighbors (no "corner" problem)
  - Hierarchical: coarser levels contain finer cells
  - Resolution 9 --> ~0.1 km^2 (good for matching)

Comparison

Method	Pros	Cons	Used By
Geohash	Simple, Redis-native GEORADIUS	Edge effects at cell boundaries	Lyft, many startups
Quadtree	Adapts to density	Complex to distribute	Early Uber
S2 Cells	Uniform, hierarchical, no edge effects	More complex	Google Maps
H3	Hexagonal (6 equidistant neighbors)	Newer, less tooling	Uber

The Boundary Problem

Rider R is at the edge of cell "9q8yyk":

+----------+----------+
|          |          |
| 9q8yyj   | 9q8yym   |
|      R*  |  *D      | <-- Driver D is close but in adjacent cell
|----------|----------|
| 9q8yy5   | 9q8yyk   |
|          |          |
+----------+----------+

Solution: Always query the rider's cell + all 8 neighboring cells
  neighbors("9q8yyk") = [9q8yyj, 9q8yym, 9q8yy5, 9q8yyh, ...]

Interview Tip: Geohash + Redis GEORADIUS is the simplest answer. Mention S2/H3 for depth. Always address the boundary problem.

5. Location Service

Driver Location Update Flow

Driver App              Location Service            Redis Geo Index
    |                        |                           |
    |-- GPS update --------->|                           |
    |   {lat, lng, heading,  |                           |
    |    speed, timestamp}   |                           |
    |   every 4 seconds      |                           |
    |                        |-- GEOADD to Redis ------->|
    |                        |                           |
    |                        |-- Publish to Kafka ------>|
    |                        |   (location-updates topic)|
    |                        |                           |
    |                        |   Kafka consumers:        |
    |                        |   - Trip Tracking         |
    |                        |   - ETA Recalculation     |
    |                        |   - Surge Computation     |
    |                        |   - Analytics Pipeline    |

Scale: 1.25M Writes/Sec

Solution: Redis Cluster geo-partitioned by city/region

  redis-nyc:    handles NYC drivers (~200K updates/sec)
  redis-sf:     handles SF drivers (~100K updates/sec)
  redis-london: handles London drivers (~80K updates/sec)
  ...

Each cluster: 10-20 shards, each handling ~10K-20K writes/sec

Driver Location TTL

Each driver entry has a 30-second TTL in Redis.
If a driver stops sending updates (app crash, went offline):
  - After 30s, entry expires automatically
  - Driver disappears from matching pool
  - No stale driver ghost problems

6. Matching Service

Matching Flow

Rider Request                     Matching Service
     |                                  |
     |-- Request ride (pickup,          |
     |   destination, ride_type) ------>|
     |                                  |
     |                    1. GEORADIUS: 10 nearest available drivers
     |                       within expanding radius (1km -> 3km -> 5km)
     |                                  |
     |                    2. For each candidate:
     |                       - Calculate ETA to pickup (routing service)
     |                       - Check driver rating > threshold
     |                       - Check acceptance rate
     |                       - Check vehicle type matches
     |                                  |
     |                    3. Rank: lowest ETA + highest rating
     |                                  |
     |                    4. Send ride offer to top driver
     |                       (push notification via WebSocket)
     |                                  |
     |                    5. Wait 15 seconds for response
     |                                  |
     |                     [ACCEPT]     |     [DECLINE/TIMEOUT]
     |                        |         |          |
     |                        v         |          v
     |                   Create trip    |    Try next driver
     |                                  |    (up to 3 attempts)
     |                                  |
     |<-- Match confirmed, ETA 4min ----|

Preventing Double-Booking

When driver accepts:

  Redis transaction (WATCH + MULTI):
    WATCH driver:D123:status
    if status == "available":
      MULTI
        SET driver:D123:status "matched"
        SET driver:D123:trip_id "T456"
      EXEC
    else:
      DISCARD  --> conflict, try next driver

This prevents two concurrent matches from assigning the same driver.

Advanced Matching: Batched Optimal Assignment

Instead of greedy matching (first-come-first-served):

Accumulate requests for 2-second windows:
  Riders:  [R1, R2, R3, R4, R5]
  Drivers: [D1, D2, D3, D4, D5, D6]

Build cost matrix (ETA from each driver to each rider):
         R1   R2   R3   R4   R5
  D1  [  3    7    2    9    5  ]
  D2  [  5    2    8    3    7  ]
  D3  [  8    4    6    1    3  ]
  ...

Solve assignment problem (Hungarian algorithm) to minimize total ETA.
Global optimum > local greedy.

7. Trip Lifecycle

State Machine

  REQUESTED --> MATCHED --> DRIVER_EN_ROUTE --> ARRIVED --> IN_TRIP --> COMPLETED
      |            |              |                |          |
      v            v              v                v          v
   CANCELLED    CANCELLED     CANCELLED        CANCELLED   CANCELLED
   (by rider)  (by rider    (by rider or     (by rider   (emergency
                or driver)   driver, fee       or driver)  only)
                              may apply)

Trip Database Schema

sql
trips:
  trip_id           UUID PRIMARY KEY
  rider_id          UUID REFERENCES users
  driver_id         UUID REFERENCES users
  status            ENUM('requested','matched','driver_en_route',
                         'arrived','in_trip','completed','cancelled')
  pickup_location   POINT
  dropoff_location  POINT
  actual_route      LINESTRING  -- GPS waypoints recorded during trip
  request_time      TIMESTAMP
  pickup_time       TIMESTAMP
  dropoff_time      TIMESTAMP
  distance_miles    DECIMAL(8,2)
  duration_minutes  DECIMAL(8,2)
  fare_cents        INT
  surge_multiplier  DECIMAL(3,2)
  payment_status    ENUM('pending','charged','refunded')
  rider_rating      SMALLINT  -- 1-5
  driver_rating     SMALLINT  -- 1-5

8. ETA Calculation

Approaches

Method	Accuracy	Speed	Use Case
Haversine (straight-line)	Low	Instant	Rough estimate only
Road network + Dijkstra	Medium	~100ms	Pre-trip estimate
Road network + real-time traffic	High	~200ms	During-trip ETA
ML-based (historical patterns)	Highest	~50ms (pre-computed)	Production systems

Road Network Routing

Road Network as Weighted Graph:
  Intersection A --[3 min at 8am, 1 min at 2am]--> Intersection B
  Intersection B --[5 min at 8am, 2 min at 2am]--> Intersection C

Algorithms:
  Dijkstra:                O(V log V + E)  -- exact, slower
  A*:                      Faster with heuristic (haversine to destination)
  Contraction Hierarchies: O(log V) queries (pre-computed shortcuts)

In Practice

Use a routing engine (OSRM, Valhalla) or Google Maps Directions API
Pre-compute cell-to-cell travel times for quick estimates
Adjust with real-time traffic from driver GPS streams
Uber's ETA system uses ML trained on billions of historical trips

Interview Tip: Don't design a routing engine. Say "we use OSRM or Google Maps API" and focus on the ride system architecture.

9. Surge Pricing

Supply-Demand Model

For each geographic cell (geohash level 6, ~1 km^2):

  demand = ride requests in last 5 minutes
  supply = available drivers in cell

  surge_ratio = demand / supply

  Pricing curve:
    ratio <= 1.0  -->  1.0x (no surge)
    ratio  = 1.5  -->  1.3x
    ratio  = 2.0  -->  1.8x
    ratio  = 3.0  -->  2.5x
    ratio >= 4.0  -->  3.0x (capped)

Surge Map

+------+------+------+------+
| 1.0x | 1.0x | 1.5x | 2.5x |  <-- Stadium event ending
+------+------+------+------+
| 1.0x | 1.0x | 1.3x | 1.8x |
+------+------+------+------+
| 1.0x | 1.0x | 1.0x | 1.0x |  <-- Normal demand
+------+------+------+------+

Effects

Incentivizes drivers to move toward high-demand areas
Reduces demand through price elasticity
Computed every 1-2 minutes per cell
Rider sees surge multiplier before confirming ride

10. Real-Time Tracking

Architecture

Driver App           Kafka              Tracking Service        Rider App
    |                  |                      |                     |
    |-- GPS update --->|                      |                     |
    |                  |-- consume event ---->|                     |
    |                  |   (for active trips) |                     |
    |                  |                      |-- push via WS ----->|
    |                  |                      |   {lat, lng, ETA}   |
    |                  |                      |                     |
    |-- GPS update --->|                      |                     |
    |                  |-- consume event ---->|                     |
    |                  |                      |-- push via WS ----->|

Client-Side Smoothing

Server sends update at t=0:  driver at (40.712, -74.006)
Server sends update at t=4s: driver at (40.713, -74.005)

Client interpolates frames between updates:
  t=0s: (40.712, -74.006)
  t=1s: (40.7123, -74.0058)  -- interpolated
  t=2s: (40.7125, -74.0055)  -- interpolated
  t=3s: (40.7128, -74.0053)  -- interpolated
  t=4s: (40.713, -74.005)    -- actual update

Result: smooth car animation on map instead of 4-second jumps

11. Payment Service

Flow

Trip Start --> Pre-authorize estimated fare on rider's card
                |
Trip In Progress --> Track actual distance and time
                |
Trip End --> Calculate final fare
                |
                v
         +------+------+
         | Charge rider |
         | (capture)    |
         +------+------+
                |
         +------+------+
         | Split:       |
         |  Platform: 25%|
         |  Driver: 75% |
         +------+------+
                |
         +------+------+
         | Driver payout|
         | (weekly      |
         |  batch or    |
         |  instant)    |
         +--------------+

Double-Entry Ledger

trip_completed (trip T456):
  DEBIT   rider_wallet      $27.60
  CREDIT  platform_revenue   $6.90  (25% commission)
  CREDIT  driver_earnings   $20.70  (75% to driver)

12. Complete System Diagram

+--------+                                      +--------+
| Rider  |          WebSocket (tracking)         | Driver |
| App    |<------------------------------------->| App    |
+---+----+                                      +---+----+
    |                                                |
    |    HTTP (ride requests)     GPS every 4 sec    |
    |                                                |
+---+------------------------------------------------+---+
|                    API Gateway                          |
|             (Auth, Rate Limit, Route)                   |
+---+--------+--------+--------+---------+--------+------+
    |        |        |        |         |        |
    v        v        v        v         v        v
+---+--+ +--+---+ +--+----+ +-+-----+ +-+-----+ +--+----+
| Ride | |Match | |Locat. | |Trip   | |Fare   | |Payment|
| Req  | |Svc   | |Svc    | |Svc    | |Calc   | |Svc    |
| Svc  | |      | |       | |       | |       | |       |
+--+---+ +--+---+ +--+----+ +--+----+ +--+----+ +--+----+
   |        |         |         |         |         |
   |        |    +----+----+    |         |    +----+------+
   |        +--->| Redis   |    |         |    | Payment   |
   |             | Cluster |    |         |    | Gateway   |
   |             | (Geo    |    |         |    | (Stripe)  |
   |             |  Index) |    |         |    +-----------+
   |             +----+----+    |         |
   |                  |         |         |
   v                  v         v         v
+--+------------------+---------+---------+--+
|                    Kafka                    |
|              (event streaming)              |
+--+----------+----------+---------+---------+
   |          |          |         |
   v          v          v         v
+--+----+ +--+-----+ +--+----+ +--+------+
|Tracking| |Surge   | |ETA    | |Analytics|
|Service | |Pricing | |Service| |Pipeline |
|(WS push)| |Engine  | |(OSRM) | |(Spark)  |
+--------+ +--------+ +-------+ +---------+

+-------------------+  +-------------------+
| Trip DB           |  | Data Warehouse    |
| (PostgreSQL)      |  | (BigQuery / S3)   |
+-------------------+  +-------------------+

13. Trade-Off Summary

Decision	Option A	Option B	Recommendation
Geo indexing	Geohash (Redis)	S2 / H3 cells	Geohash for MVP, H3 at Uber-scale
Location store	Redis only	Redis + Kafka	Redis + Kafka (real-time + downstream)
Matching	Greedy (nearest)	Batched optimal	Greedy with batch for high-demand zones
Tracking	HTTP polling	WebSocket push	WebSocket (lower latency, lower overhead)
ETA	Haversine	Road network routing	Road network (far more accurate)
Surge	Fixed tiers	Continuous curve	Continuous with caps (smoother UX)
Payments	Pay-per-trip	Pre-authorize + capture	Pre-authorize (handles fare changes)

14. Interview Tips

Geospatial indexing is THE key topic: Explain geohash clearly, mention S2/H3 for depth
Location updates are the hottest path: 1.25M writes/sec needs Redis cluster sharded by region
Matching is not just nearest driver: Mention ETA, driver rating, acceptance rate, vehicle type
Double-booking prevention: Show you understand distributed locking / optimistic concurrency
Surge pricing: Simple concept, explain supply-demand per geographic cell
Trip state machine: Walk through the full lifecycle -- shows you think about edge cases
Boundary problem: Always mention that geohash queries must include adjacent cells
Don't over-design routing: Say "use OSRM or Google Maps API" and move on

15. Resources

Alex Xu - System Design Interview Vol. 2, Chapter 1: Proximity Service
Uber Engineering Blog: "Uber's Big Data Platform" and "Geospatial Indexing"
Uber H3: Hexagonal hierarchical geospatial indexing system (h3geo.org)
Lyft Engineering: "How Lyft Discovers Drivers for Riders"
Google S2 Geometry Library documentation
OSRM (Open Source Routing Machine): project-osrm.org
"Scaling Uber's Real-time Market Platform" (InfoQ talk)
Martin Kleppmann - Designing Data-Intensive Applications, Chapter 2 (Data Models)

Previous: 34 - Design Google Search | Next: 36 - Design Twitter