35 - Design Uber or Lyft

Series: System Design & Distributed Systems Previous: 34 - Design Google Search | Next: 36 - Design Twitter


1. Requirements

Functional Requirements

FeatureDetails
Request rideRider specifies pickup and destination
Driver matchingMatch nearest available driver in real-time
Real-time trackingLive location of driver on rider's map
Fare calculationEstimate before trip, final fare after trip
Surge pricingDynamic pricing based on supply and demand
Trip historyPast trips for riders and drivers
RatingMutual rider/driver ratings after trip
PaymentsCharge rider, pay driver

Non-Functional Requirements

  • Latency: Match driver within 10 seconds, location updates within 2 seconds
  • Availability: 99.99% (riders stranded is a critical failure)
  • Scale: 100M riders, 5M drivers, 20M trips/day
  • Accuracy: GPS within 5-10 meters, ETA within 2 minutes
  • Consistency: No double-booking of drivers

2. Capacity Estimation

Active drivers:         5M concurrent
Location updates:       5M drivers x 1 update/4s = 1.25M updates/sec
Rides/day:              20M
Ride requests/sec:      ~230 QPS avg, ~700 peak
Trip tracking events:   20M rides x 300 updates/ride = 6B events/day
Trip data/day:          20M x 5KB = 100GB
ResourceEstimate
Location update QPS~1.25M writes/sec
Matching QPS~700/sec peak
Trip events/day~6B
Location data/day~2TB (ephemeral, in-memory)

3. High-Level Architecture

+--------+         +--------+
| Rider  |         | Driver |
| App    |         | App    |
+---+----+         +---+----+
    |                   |
    v                   v
+---+-------------------+---+
|       API Gateway         |
|   (Auth, Rate Limit)      |
+---+------+------+---------+
    |      |      |
    v      v      v
+---+--+ +-+----+ +--+------+
| Ride | |Match | |Location |
| Svc  | |Svc   | |Service  |
+--+---+ +--+---+ +--+------+
   |        |         |
   |        |    +----+-----+
   |        +--->| Geo Index |
   |             | (Redis    |
   |             |  Cluster) |
   |             +----+------+
   |                  |
   v                  v
+--+--------+    +----+----+
| Trip DB   |    | Kafka   |
| (Postgres)|    | (events)|
+-----------+    +----+----+
                      |
         +------+-----+-----+------+
         |      |            |      |
     +---+---+ +---+---+ +--+---+ +---+---+
     |Tracking| |  ETA  | |Surge | |Analyt.|
     |Service | |Service| |Svc   | |Pipe   |
     +--------+ +-------+ +------+ +-------+

4. Geospatial Indexing

The core challenge: given a rider at (lat, lng), find the nearest available drivers among 5 million moving targets.

Option 1: Geohash

Geohash divides the world into a grid of cells encoded as strings.
Nearby locations share a common prefix.

(37.7749, -122.4194) --> "9q8yyk"

+------+------+------+------+
| 9q8y | 9q8z | 9q90 | 9q91 |
+------+------+------+------+
| 9q8v | 9q8w | 9q8x | 9q8y |
+------+------+------+------+
| 9q8s | 9q8t | 9q8u | 9q8v |
+------+------+------+------+

Precision levels:
  4 chars: ~39km x 20km  (city level)
  5 chars: ~5km x 5km    (neighborhood)
  6 chars: ~1.2km x 0.6km (block level) <-- good for matching

Redis native support:
  GEOADD drivers:available <lng> <lat> driver_123
  GEORADIUS drivers:available <rider_lng> <rider_lat> 3 km ASC COUNT 10

Option 2: Quadtree

Recursively divide space into 4 quadrants.
Stop when cell has < K drivers (e.g., K=100).

          World
         /    \
       /        \
     NW    NE    SW    SE
    / \   / \   / \   / \
   ...              ...

Pros: Adaptive (dense areas = smaller cells)
Cons: Harder to distribute, tree rebalancing on driver movement

Option 3: S2 Cells (Google) / H3 (Uber)

S2 projects Earth onto a cube, uses Hilbert curve for cell IDs.

Level 12: ~3.3 km^2 cells  (city blocks)
Level 14: ~0.8 km^2 cells  (fine-grained)
Level 16: ~0.05 km^2 cells (building level)

H3 (Uber's variant): hexagonal cells
  - Each hex has 6 equidistant neighbors (no "corner" problem)
  - Hierarchical: coarser levels contain finer cells
  - Resolution 9 --> ~0.1 km^2 (good for matching)

Comparison

MethodProsConsUsed By
GeohashSimple, Redis-native GEORADIUSEdge effects at cell boundariesLyft, many startups
QuadtreeAdapts to densityComplex to distributeEarly Uber
S2 CellsUniform, hierarchical, no edge effectsMore complexGoogle Maps
H3Hexagonal (6 equidistant neighbors)Newer, less toolingUber

The Boundary Problem

Rider R is at the edge of cell "9q8yyk":

+----------+----------+
|          |          |
| 9q8yyj   | 9q8yym   |
|      R*  |  *D      | <-- Driver D is close but in adjacent cell
|----------|----------|
| 9q8yy5   | 9q8yyk   |
|          |          |
+----------+----------+

Solution: Always query the rider's cell + all 8 neighboring cells
  neighbors("9q8yyk") = [9q8yyj, 9q8yym, 9q8yy5, 9q8yyh, ...]

Interview Tip: Geohash + Redis GEORADIUS is the simplest answer. Mention S2/H3 for depth. Always address the boundary problem.


5. Location Service

Driver Location Update Flow

Driver App              Location Service            Redis Geo Index
    |                        |                           |
    |-- GPS update --------->|                           |
    |   {lat, lng, heading,  |                           |
    |    speed, timestamp}   |                           |
    |   every 4 seconds      |                           |
    |                        |-- GEOADD to Redis ------->|
    |                        |                           |
    |                        |-- Publish to Kafka ------>|
    |                        |   (location-updates topic)|
    |                        |                           |
    |                        |   Kafka consumers:        |
    |                        |   - Trip Tracking         |
    |                        |   - ETA Recalculation     |
    |                        |   - Surge Computation     |
    |                        |   - Analytics Pipeline    |

Scale: 1.25M Writes/Sec

Solution: Redis Cluster geo-partitioned by city/region

  redis-nyc:    handles NYC drivers (~200K updates/sec)
  redis-sf:     handles SF drivers (~100K updates/sec)
  redis-london: handles London drivers (~80K updates/sec)
  ...

Each cluster: 10-20 shards, each handling ~10K-20K writes/sec

Driver Location TTL

Each driver entry has a 30-second TTL in Redis.
If a driver stops sending updates (app crash, went offline):
  - After 30s, entry expires automatically
  - Driver disappears from matching pool
  - No stale driver ghost problems

6. Matching Service

Matching Flow

Rider Request                     Matching Service
     |                                  |
     |-- Request ride (pickup,          |
     |   destination, ride_type) ------>|
     |                                  |
     |                    1. GEORADIUS: 10 nearest available drivers
     |                       within expanding radius (1km -> 3km -> 5km)
     |                                  |
     |                    2. For each candidate:
     |                       - Calculate ETA to pickup (routing service)
     |                       - Check driver rating > threshold
     |                       - Check acceptance rate
     |                       - Check vehicle type matches
     |                                  |
     |                    3. Rank: lowest ETA + highest rating
     |                                  |
     |                    4. Send ride offer to top driver
     |                       (push notification via WebSocket)
     |                                  |
     |                    5. Wait 15 seconds for response
     |                                  |
     |                     [ACCEPT]     |     [DECLINE/TIMEOUT]
     |                        |         |          |
     |                        v         |          v
     |                   Create trip    |    Try next driver
     |                                  |    (up to 3 attempts)
     |                                  |
     |<-- Match confirmed, ETA 4min ----|

Preventing Double-Booking

When driver accepts:

  Redis transaction (WATCH + MULTI):
    WATCH driver:D123:status
    if status == "available":
      MULTI
        SET driver:D123:status "matched"
        SET driver:D123:trip_id "T456"
      EXEC
    else:
      DISCARD  --> conflict, try next driver

This prevents two concurrent matches from assigning the same driver.

Advanced Matching: Batched Optimal Assignment

Instead of greedy matching (first-come-first-served):

Accumulate requests for 2-second windows:
  Riders:  [R1, R2, R3, R4, R5]
  Drivers: [D1, D2, D3, D4, D5, D6]

Build cost matrix (ETA from each driver to each rider):
         R1   R2   R3   R4   R5
  D1  [  3    7    2    9    5  ]
  D2  [  5    2    8    3    7  ]
  D3  [  8    4    6    1    3  ]
  ...

Solve assignment problem (Hungarian algorithm) to minimize total ETA.
Global optimum > local greedy.

7. Trip Lifecycle

State Machine

  REQUESTED --> MATCHED --> DRIVER_EN_ROUTE --> ARRIVED --> IN_TRIP --> COMPLETED
      |            |              |                |          |
      v            v              v                v          v
   CANCELLED    CANCELLED     CANCELLED        CANCELLED   CANCELLED
   (by rider)  (by rider    (by rider or     (by rider   (emergency
                or driver)   driver, fee       or driver)  only)
                              may apply)

Trip Database Schema

sql
trips: trip_id UUID PRIMARY KEY rider_id UUID REFERENCES users driver_id UUID REFERENCES users status ENUM('requested','matched','driver_en_route', 'arrived','in_trip','completed','cancelled') pickup_location POINT dropoff_location POINT actual_route LINESTRING -- GPS waypoints recorded during trip request_time TIMESTAMP pickup_time TIMESTAMP dropoff_time TIMESTAMP distance_miles DECIMAL(8,2) duration_minutes DECIMAL(8,2) fare_cents INT surge_multiplier DECIMAL(3,2) payment_status ENUM('pending','charged','refunded') rider_rating SMALLINT -- 1-5 driver_rating SMALLINT -- 1-5

8. ETA Calculation

Approaches

MethodAccuracySpeedUse Case
Haversine (straight-line)LowInstantRough estimate only
Road network + DijkstraMedium~100msPre-trip estimate
Road network + real-time trafficHigh~200msDuring-trip ETA
ML-based (historical patterns)Highest~50ms (pre-computed)Production systems

Road Network Routing

Road Network as Weighted Graph:
  Intersection A --[3 min at 8am, 1 min at 2am]--> Intersection B
  Intersection B --[5 min at 8am, 2 min at 2am]--> Intersection C

Algorithms:
  Dijkstra:                O(V log V + E)  -- exact, slower
  A*:                      Faster with heuristic (haversine to destination)
  Contraction Hierarchies: O(log V) queries (pre-computed shortcuts)

In Practice

  • Use a routing engine (OSRM, Valhalla) or Google Maps Directions API
  • Pre-compute cell-to-cell travel times for quick estimates
  • Adjust with real-time traffic from driver GPS streams
  • Uber's ETA system uses ML trained on billions of historical trips

Interview Tip: Don't design a routing engine. Say "we use OSRM or Google Maps API" and focus on the ride system architecture.


9. Surge Pricing

Supply-Demand Model

For each geographic cell (geohash level 6, ~1 km^2):

  demand = ride requests in last 5 minutes
  supply = available drivers in cell

  surge_ratio = demand / supply

  Pricing curve:
    ratio <= 1.0  -->  1.0x (no surge)
    ratio  = 1.5  -->  1.3x
    ratio  = 2.0  -->  1.8x
    ratio  = 3.0  -->  2.5x
    ratio >= 4.0  -->  3.0x (capped)

Surge Map

+------+------+------+------+
| 1.0x | 1.0x | 1.5x | 2.5x |  <-- Stadium event ending
+------+------+------+------+
| 1.0x | 1.0x | 1.3x | 1.8x |
+------+------+------+------+
| 1.0x | 1.0x | 1.0x | 1.0x |  <-- Normal demand
+------+------+------+------+

Effects

  • Incentivizes drivers to move toward high-demand areas
  • Reduces demand through price elasticity
  • Computed every 1-2 minutes per cell
  • Rider sees surge multiplier before confirming ride

10. Real-Time Tracking

Architecture

Driver App           Kafka              Tracking Service        Rider App
    |                  |                      |                     |
    |-- GPS update --->|                      |                     |
    |                  |-- consume event ---->|                     |
    |                  |   (for active trips) |                     |
    |                  |                      |-- push via WS ----->|
    |                  |                      |   {lat, lng, ETA}   |
    |                  |                      |                     |
    |-- GPS update --->|                      |                     |
    |                  |-- consume event ---->|                     |
    |                  |                      |-- push via WS ----->|

Client-Side Smoothing

Server sends update at t=0:  driver at (40.712, -74.006)
Server sends update at t=4s: driver at (40.713, -74.005)

Client interpolates frames between updates:
  t=0s: (40.712, -74.006)
  t=1s: (40.7123, -74.0058)  -- interpolated
  t=2s: (40.7125, -74.0055)  -- interpolated
  t=3s: (40.7128, -74.0053)  -- interpolated
  t=4s: (40.713, -74.005)    -- actual update

Result: smooth car animation on map instead of 4-second jumps

11. Payment Service

Flow

Trip Start --> Pre-authorize estimated fare on rider's card
                |
Trip In Progress --> Track actual distance and time
                |
Trip End --> Calculate final fare
                |
                v
         +------+------+
         | Charge rider |
         | (capture)    |
         +------+------+
                |
         +------+------+
         | Split:       |
         |  Platform: 25%|
         |  Driver: 75% |
         +------+------+
                |
         +------+------+
         | Driver payout|
         | (weekly      |
         |  batch or    |
         |  instant)    |
         +--------------+

Double-Entry Ledger

trip_completed (trip T456):
  DEBIT   rider_wallet      $27.60
  CREDIT  platform_revenue   $6.90  (25% commission)
  CREDIT  driver_earnings   $20.70  (75% to driver)

12. Complete System Diagram

+--------+                                      +--------+
| Rider  |          WebSocket (tracking)         | Driver |
| App    |<------------------------------------->| App    |
+---+----+                                      +---+----+
    |                                                |
    |    HTTP (ride requests)     GPS every 4 sec    |
    |                                                |
+---+------------------------------------------------+---+
|                    API Gateway                          |
|             (Auth, Rate Limit, Route)                   |
+---+--------+--------+--------+---------+--------+------+
    |        |        |        |         |        |
    v        v        v        v         v        v
+---+--+ +--+---+ +--+----+ +-+-----+ +-+-----+ +--+----+
| Ride | |Match | |Locat. | |Trip   | |Fare   | |Payment|
| Req  | |Svc   | |Svc    | |Svc    | |Calc   | |Svc    |
| Svc  | |      | |       | |       | |       | |       |
+--+---+ +--+---+ +--+----+ +--+----+ +--+----+ +--+----+
   |        |         |         |         |         |
   |        |    +----+----+    |         |    +----+------+
   |        +--->| Redis   |    |         |    | Payment   |
   |             | Cluster |    |         |    | Gateway   |
   |             | (Geo    |    |         |    | (Stripe)  |
   |             |  Index) |    |         |    +-----------+
   |             +----+----+    |         |
   |                  |         |         |
   v                  v         v         v
+--+------------------+---------+---------+--+
|                    Kafka                    |
|              (event streaming)              |
+--+----------+----------+---------+---------+
   |          |          |         |
   v          v          v         v
+--+----+ +--+-----+ +--+----+ +--+------+
|Tracking| |Surge   | |ETA    | |Analytics|
|Service | |Pricing | |Service| |Pipeline |
|(WS push)| |Engine  | |(OSRM) | |(Spark)  |
+--------+ +--------+ +-------+ +---------+

+-------------------+  +-------------------+
| Trip DB           |  | Data Warehouse    |
| (PostgreSQL)      |  | (BigQuery / S3)   |
+-------------------+  +-------------------+

13. Trade-Off Summary

DecisionOption AOption BRecommendation
Geo indexingGeohash (Redis)S2 / H3 cellsGeohash for MVP, H3 at Uber-scale
Location storeRedis onlyRedis + KafkaRedis + Kafka (real-time + downstream)
MatchingGreedy (nearest)Batched optimalGreedy with batch for high-demand zones
TrackingHTTP pollingWebSocket pushWebSocket (lower latency, lower overhead)
ETAHaversineRoad network routingRoad network (far more accurate)
SurgeFixed tiersContinuous curveContinuous with caps (smoother UX)
PaymentsPay-per-tripPre-authorize + capturePre-authorize (handles fare changes)

14. Interview Tips

  1. Geospatial indexing is THE key topic: Explain geohash clearly, mention S2/H3 for depth
  2. Location updates are the hottest path: 1.25M writes/sec needs Redis cluster sharded by region
  3. Matching is not just nearest driver: Mention ETA, driver rating, acceptance rate, vehicle type
  4. Double-booking prevention: Show you understand distributed locking / optimistic concurrency
  5. Surge pricing: Simple concept, explain supply-demand per geographic cell
  6. Trip state machine: Walk through the full lifecycle -- shows you think about edge cases
  7. Boundary problem: Always mention that geohash queries must include adjacent cells
  8. Don't over-design routing: Say "use OSRM or Google Maps API" and move on

15. Resources

  • Alex Xu - System Design Interview Vol. 2, Chapter 1: Proximity Service
  • Uber Engineering Blog: "Uber's Big Data Platform" and "Geospatial Indexing"
  • Uber H3: Hexagonal hierarchical geospatial indexing system (h3geo.org)
  • Lyft Engineering: "How Lyft Discovers Drivers for Riders"
  • Google S2 Geometry Library documentation
  • OSRM (Open Source Routing Machine): project-osrm.org
  • "Scaling Uber's Real-time Market Platform" (InfoQ talk)
  • Martin Kleppmann - Designing Data-Intensive Applications, Chapter 2 (Data Models)

Previous: 34 - Design Google Search | Next: 36 - Design Twitter