41 - Design Ticket Booking System
Previous: 40 - Design Distributed Cache | Next: 42 - Design Payment System
1. Problem Statement
Design a ticket/seat booking system for events (concerts, flights, movies). The core challenge: prevent double bookings under extreme concurrency while maintaining a responsive user experience. Think Ticketmaster, BookMyShow, or airline reservation systems.
2. Requirements
Functional
| Requirement | Detail |
|---|---|
| Browse events | Search, filter, view event details and venue map |
| View seat availability | Real-time (or near-real-time) seat map with status |
| Select & hold seats | Temporary reservation while user completes payment |
| Payment & confirmation | Integrate with payment gateway, issue ticket/confirmation |
| Cancellation & refund | User can cancel within policy window |
| Waiting queue | Queue for sold-out popular events |
Non-Functional
| Requirement | Target |
|---|---|
| Availability | 99.99% |
| Consistency | Strong for booking (no double-sell), eventual for availability view |
| Latency | < 500ms for seat selection, < 2s for booking confirmation |
| Scale | Handle 10K+ concurrent users for a single popular event (flash sale) |
3. The Concurrency Challenge
The fundamental problem: two users click the same seat at the same instant.
Timeline:
t=0 User A reads seat S1 -> status: AVAILABLE
t=0 User B reads seat S1 -> status: AVAILABLE
t=1 User A books seat S1 -> SUCCESS
t=1 User B books seat S1 -> SUCCESS ??? <-- DOUBLE BOOKING!
This is a classic lost update / write-write conflict problem.
4. Seat Selection Strategies
Strategy 1: Pessimistic Locking (Database Row Lock)
sqlBEGIN TRANSACTION; SELECT * FROM seats WHERE seat_id = 'S1' AND event_id = 'E1' FOR UPDATE; -- Row is now locked, other transactions block here UPDATE seats SET status = 'HELD', held_by = 'user_A', held_until = NOW() + INTERVAL '10 min' WHERE seat_id = 'S1' AND status = 'AVAILABLE'; COMMIT;
| Pros | Cons |
|---|---|
| Simple, strong guarantee | Blocks concurrent requests (latency spikes) |
| Database handles correctness | Deadlock risk with multi-seat selection |
| Battle-tested | Poor throughput under contention |
Strategy 2: Optimistic Locking (Version-Based)
sql-- Read SELECT seat_id, status, version FROM seats WHERE seat_id = 'S1'; -- version = 5, status = AVAILABLE -- Write (only succeeds if version unchanged) UPDATE seats SET status = 'HELD', held_by = 'user_A', version = 6, held_until = NOW() + INTERVAL '10 min' WHERE seat_id = 'S1' AND version = 5 AND status = 'AVAILABLE'; -- If affected_rows = 0, someone else got it first -> retry or show "unavailable"
| Pros | Cons |
|---|---|
| No blocking (non-locking) | Retry storms under high contention |
| Higher throughput for low-contention | User experience: "someone took your seat" |
| Simple implementation | Needs retry logic in application |
Strategy 3: Redis-Based Distributed Lock
-- Atomic SET-if-not-exists with TTL
SET seat:E1:S1 user_A NX EX 600
-- Returns OK if set (seat claimed), nil if already taken
-- Release on payment timeout
DEL seat:E1:S1
| Pros | Cons |
|---|---|
| Sub-millisecond latency | Redis failure = lock system failure |
| TTL handles abandoned holds | Requires careful Redis HA setup |
| No DB contention | Extra infrastructure component |
Comparison
| Criteria | Pessimistic | Optimistic | Redis Lock |
|---|---|---|---|
| Throughput | Low | Medium | High |
| Complexity | Low | Medium | Medium |
| Consistency | Strong | Strong (with retry) | Strong (with NX) |
| Best for | Low concurrency | Medium concurrency | Flash sales, high concurrency |
5. Reservation with TTL (Hold-Then-Pay)
State Machine for a Seat:
AVAILABLE --[user selects]--> HELD (TTL: 10 min)
^ |
| +------+------+
| | |
[TTL expires] [payment OK] [payment fail]
| | |
| v |
+---- AVAILABLE <----+ BOOKED |
|
AVAILABLE <---+
TTL Hold Mechanism
1. User selects seat -> SET seat status = HELD, held_until = now + 10min
2. User proceeds to payment page
3. If payment succeeds within 10min -> status = BOOKED
4. If payment fails or user abandons:
- Background job sweeps expired holds every 30s
- OR: Redis key expires automatically (if using Redis locks)
- Seat returns to AVAILABLE
Interview Tip
The 10-minute hold window is a business decision, not a technical one. Mention that Ticketmaster uses ~8 minutes, airlines use ~20 minutes. Shorter hold = more inventory turnover. Longer hold = better user experience.
6. Full System Architecture
+--------------------+
| CDN (static |
| assets, seat map) |
+--------+-----------+
|
+--------v-----------+
| Load Balancer |
+--------+-----------+
|
+-------------------+-------------------+
| | |
+--------v------+ +--------v------+ +--------v------+
| API Gateway | | API Gateway | | API Gateway |
| (rate limit, | | | | |
| auth, queue) | | | | |
+--------+------+ +-------+-------+ +-------+------+
| | |
+------------------+------------------+
|
+------------------+------------------+
| | |
+--------v------+ +-------v-------+ +-------v-------+
| Event Service | | Booking Svc | | Payment Svc |
| (catalog, | | (seat lock, | | (PSP integ, |
| search) | | reservation) | | webhook) |
+--------+------+ +-------+-------+ +-------+-------+
| | |
+--------v------+ +-------v-------+ |
| Event DB | | Redis Cluster | |
| (PostgreSQL) | | (seat locks, | +-------v-------+
+---------------+ | hold TTLs) | | Payment PSP |
+-------+-------+ | (Stripe) |
| +---------------+
+-------v-------+
| Booking DB |
| (PostgreSQL) |
| - reservations|
| - tickets |
+---------------+
+------------------+ +------------------+
| Notification Svc | | Queue Service |
| (email, SMS, | | (waiting list |
| push) | | for sold-out) |
+------------------+ +------------------+
7. Payment Integration Flow
User clicks "Pay"
|
v
+----+----+
| Booking | 1. Validate hold is still active
| Service | 2. Create payment intent
+----+----+
|
v
+----+----+
| Payment | 3. Call PSP (Stripe) to authorize
| Service | 4. On success: mark seat BOOKED, generate ticket
+----+----+ 5. On failure: release hold, notify user
|
v
+----+----+
| PSP | Stripe/PayPal processes card
| (Stripe)| Returns: success/failure/pending
+----+----+
|
v
Webhook callback to Payment Service
(confirms final status)
Handling Failures Mid-Booking
| Failure Point | Recovery |
|---|---|
| Payment times out | Poll PSP for status, retry once, then release hold |
| PSP returns "pending" | Hold seat, wait for webhook confirmation |
| Network failure after PSP charge | Idempotency key ensures no double charge; poll PSP |
| Booking DB write fails after payment | Retry DB write; worst case: refund and release seat |
| User closes browser mid-payment | Hold TTL expires, seat released; if PSP charged, auto-refund |
8. Waiting Queue for Popular Events
For sold-out events or flash sales, a virtual queue prevents system overload.
User arrives
|
v
+----+--------+
| Queue Gate | Assign queue position, estimated wait time
| (Redis | Token-bucket: admit N users/min to booking flow
| sorted set)|
+----+--------+
|
| when position reached
v
+----+--------+
| Booking | User has limited time window (e.g., 5 min)
| Flow | to select seats and complete booking
+-------------+
Queue implementation:
ZADD queue:event_123 <timestamp> <user_id>
ZRANK queue:event_123 <user_id> -- position in queue
ZPOPMIN queue:event_123 -- admit next user
Interview Tip
Mention Ticketmaster's "Smart Queue" or Cloudflare's Waiting Room as real-world precedents. The queue protects the booking system from thundering herd during on-sale moments.
9. Seat Map Rendering
Seat Map Data Model:
Venue:
sections: [Section]
Section:
section_id, name, price_tier
rows: [Row]
Row:
row_id, label ("A", "B", ...)
seats: [Seat]
Seat:
seat_id, number, status (AVAILABLE | HELD | BOOKED | BLOCKED)
price, accessibility_flag
Availability Broadcast
Option 1: Polling
Client polls GET /events/{id}/seats every 5-10 seconds
Simple but wasteful
Option 2: Server-Sent Events (SSE)
Server pushes seat status changes to connected clients
Efficient, one-directional
Option 3: WebSocket
Full-duplex, real-time seat map updates
Best UX but highest server resource cost
Recommendation: SSE for seat map (server -> client only)
10. Scaling for Flash Sales
When Taylor Swift tickets go on sale, 10K+ users target the same event simultaneously.
| Technique | How |
|---|---|
| Virtual queue | Gate admission to booking flow (see section 8) |
| Redis cluster | Seat locks in Redis, horizontally scaled |
| Read replicas | Serve seat availability from replicas (eventual consistency OK for display) |
| Pre-compute seat map | Cache full seat map in CDN, update via SSE delta |
| Shard by section | Different booking servers handle different venue sections |
| Rate limiting | Per-user rate limits to prevent bots |
| Bot detection | CAPTCHA, browser fingerprinting, behavioral analysis |
| Overprovisioning | Auto-scale booking service before announced on-sale time |
11. Database Schema (Simplified)
sql-- Events CREATE TABLE events ( event_id UUID PRIMARY KEY, name TEXT NOT NULL, venue_id UUID REFERENCES venues(venue_id), event_date TIMESTAMPTZ, status TEXT DEFAULT 'UPCOMING' -- UPCOMING, ON_SALE, SOLD_OUT, COMPLETED ); -- Seats (per event, denormalized for performance) CREATE TABLE event_seats ( event_id UUID REFERENCES events(event_id), seat_id UUID, section TEXT, row_label TEXT, seat_number INT, price DECIMAL(10,2), status TEXT DEFAULT 'AVAILABLE', -- AVAILABLE, HELD, BOOKED held_by UUID, held_until TIMESTAMPTZ, version INT DEFAULT 0, -- for optimistic locking PRIMARY KEY (event_id, seat_id) ); -- Bookings CREATE TABLE bookings ( booking_id UUID PRIMARY KEY, user_id UUID NOT NULL, event_id UUID NOT NULL, total_amount DECIMAL(10,2), status TEXT DEFAULT 'PENDING', -- PENDING, CONFIRMED, CANCELLED, REFUNDED payment_id TEXT, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Booking items (seats in a booking) CREATE TABLE booking_items ( booking_id UUID REFERENCES bookings(booking_id), seat_id UUID, event_id UUID, price DECIMAL(10,2), PRIMARY KEY (booking_id, seat_id) );
12. Eventual Consistency for Seat Availability View
Strong Consistency Path
User selects seat --> Redis lock (NX) --> DB update --> BOOKED
|
| if lock acquired, seat is guaranteed yours
|
Eventual Consistency Path
Seat map display <-- Read replica <-- async replication <-- Primary DB
|
| may show a seat as AVAILABLE for a few seconds
| after it was actually booked (acceptable for display)
| user will get "seat taken" error on selection attempt
Interview Tip
Clearly separate the consistency requirements: writes (booking) need strong consistency, reads (seat map display) can be eventually consistent. This duality is key to scaling the system.
13. Key Trade-offs Discussion
| Decision | Option A | Option B |
|---|---|---|
| Locking | Pessimistic DB lock (simple, low throughput) | Redis NX lock (fast, extra infra) |
| Seat map updates | Polling (simple) | WebSocket/SSE (real-time, complex) |
| Hold TTL | Short 5min (more turnover) | Long 15min (better UX) |
| Queue | Virtual queue (fair, predictable) | No queue (faster for low traffic) |
| Availability | Strong consistency (slower) | Eventual for display (scalable) |
| Payment | Sync (simple flow) | Async with webhooks (resilient) |
14. Capacity Estimation
Assumptions:
- Popular event: 50,000 seats
- On-sale moment: 100K users in queue, 10K concurrent in booking flow
- Average booking: 2.5 seats per transaction
Seat lock operations:
- 10K users * 2.5 seats = 25K lock attempts
- Redis handles 100K+ ops/sec -> single Redis cluster sufficient
Database writes:
- ~20K bookings over first 30 minutes
- ~50K seat status updates
- PostgreSQL on good hardware: 10K+ TPS -> sufficient
API requests:
- Seat map polling: 100K users * 1 req/5sec = 20K QPS (serve from cache/CDN)
- Booking flow: 10K users making selections = ~5K QPS (serve from API)
15. Interview Checklist
- Identified the core challenge: double-booking prevention under concurrency
- Compared locking strategies (pessimistic vs optimistic vs Redis)
- Designed hold-then-pay flow with TTL and state machine
- Payment integration with failure handling and idempotency
- Virtual queue for flash sale scenarios
- Separated strong consistency (booking) from eventual consistency (display)
- Seat map rendering and real-time update strategy
- Scaling techniques for high-concurrency events
- Bot prevention and rate limiting
16. Resources
- System Design Interview (Alex Xu, Vol 2) -- Hotel Reservation System chapter
- Designing Data-Intensive Applications (Kleppmann) -- Chapter 7: Transactions
- Ticketmaster Engineering Blog -- Scaling for On-Sales
- YouTube: System Design Interview -- Design Ticket Master
- YouTube: Gaurav Sen -- Booking System Design
- Paper: "Scalable Reservation Systems" -- techniques for high-contention scenarios
Previous: 40 - Design Distributed Cache | Next: 42 - Design Payment System