33 - Design YouTube or Netflix
Series: System Design & Distributed Systems Previous: 32 - Design News Feed | Next: 34 - Design Google Search
1. Requirements
Functional Requirements
| Feature | Details |
|---|---|
| Upload video | Creators upload videos of varying length/quality |
| Stream video | Users watch videos with adaptive quality |
| Search | Find videos by title, description, tags |
| Recommendations | Personalized video suggestions |
| Comments & likes | Engagement features |
| View count | Accurate at-scale counting |
| Thumbnails | Auto-generated and custom thumbnails |
Non-Functional Requirements
- Availability: 99.99% for streaming, 99.9% for upload
- Latency: Video playback starts in < 2s
- Scale: 2B MAU, 500 hours of video uploaded per minute (YouTube scale)
- Durability: Zero data loss for uploaded content
- Global: Low-latency streaming worldwide via CDN
2. Capacity Estimation
DAU: 800M (viewing), 5M (uploading)
Videos watched/day: 5B views/day
Average video length: 5 min
Average video size: 500MB (after transcoding to multiple resolutions)
Upload volume: 500 hrs/min = 30K hrs/day
Storage for uploads: 30K hrs x 60 min x 500MB/5min = 180TB/day raw
Streaming bandwidth: 5B views x 5 min x 5Mbps = ~3PB/day outbound
| Resource | Estimate |
|---|---|
| Storage growth/day | ~180TB (multi-resolution) |
| Peak streaming QPS | ~100K concurrent streams |
| CDN bandwidth | ~3PB/day |
3. High-Level Architecture
+----------+ +----------+ +----------+
| Creator | | Viewer | | Viewer |
| (Upload) | | (Stream) | | (Search) |
+----+-----+ +----+-----+ +----+-----+
| | |
v v v
+---------+ +------+------+ +----+------+
| Upload | | CDN | | API |
| Service | | (CloudFront | | Gateway |
| | | Akamai) | | |
+----+----+ +------+------+ +---+---+---+
| | | | |
v | +----+ | +----+
+----+--------+ | | | |
| Transcoding | | +----+---+ +--+-----+ +--+------+
| Pipeline | | |Metadata| |Search | |Recommend|
| (Workers) | | |Service | |Service | |Service |
+----+--------+ | +---+----+ +---+----+ +---------+
| | | |
v | +----+----+ +---+--------+
+----+--------+ | |Video | |Elasticsearch|
| Object |<------+ |Meta DB | +-------------+
| Storage | |(MySQL) |
| (S3/GCS) | +--------+
+-------------+
4. Video Upload Pipeline
Creator Upload Service Transcoding Pipeline
| | |
|-- Upload video -------->| |
| (resumable, chunked) | |
| |-- Store original ----->|
| | to S3 (raw/) |
| | |
| |-- Enqueue job -------->|
| | to Kafka |
|<-- Upload ACK ----------| |
| (processing status) | |
| | +------+------+
| | | Split video |
| | | into segments|
| | +------+------+
| | |
| | +------+------+
| | | Transcode |
| | | each segment |
| | | to multiple |
| | | resolutions |
| | +------+------+
| | |
| | +------+------+
| | | Generate |
| | | thumbnails |
| | +------+------+
| | |
| | +------+------+
| | | Store to S3 |
| | | (processed/) |
| | +------+------+
| | |
| | +------+------+
| | | Update DB: |
| | | status=ready |
| | +-------------+
| |
|<-- Notification: video is live! ----------|
Resumable Upload
- Use tus protocol or Google's resumable upload API
- Split large files into 5MB chunks
- Track upload progress server-side
- Resume from last successful chunk on failure
5. Video Transcoding
Why Transcode?
- Different devices need different resolutions and codecs
- Bandwidth varies: mobile 3G vs desktop fiber
- Storage optimization: efficient codecs reduce CDN costs
Transcoding Output Matrix
| Resolution | Bitrate | Codec | Use Case |
|---|---|---|---|
| 240p | 400 kbps | H.264 | Slow mobile connections |
| 360p | 800 kbps | H.264 | Mobile data saver |
| 480p | 1.5 Mbps | H.264 | Standard mobile |
| 720p | 3 Mbps | H.264/H.265 | Tablet, laptop |
| 1080p | 6 Mbps | H.265 | Desktop, smart TV |
| 4K | 15 Mbps | H.265/AV1 | Large screens |
Adaptive Bitrate Streaming (ABR)
Master Playlist (HLS .m3u8)
|
+-- 240p playlist --> segment_001.ts, segment_002.ts, ...
+-- 480p playlist --> segment_001.ts, segment_002.ts, ...
+-- 720p playlist --> segment_001.ts, segment_002.ts, ...
+-- 1080p playlist -> segment_001.ts, segment_002.ts, ...
HLS (HTTP Live Streaming): Apple standard, widely supported DASH (Dynamic Adaptive Streaming over HTTP): Open standard
How ABR Works
Client monitors bandwidth
|
v
Bandwidth = 5 Mbps --> Request 720p segments
|
Bandwidth drops to 1 Mbps
|
v
Switch to 360p segments (no rebuffer)
|
Bandwidth recovers to 8 Mbps
|
v
Switch to 1080p segments
Interview Tip: Mention that videos are split into 2-10 second segments. The client can switch quality between segments, enabling seamless quality adaptation.
6. CDN for Video Delivery
Multi-Tier CDN Architecture
Origin (S3)
|
v
+---+----------+
| Origin Shield | (intermediate cache, reduces origin load)
+---+----------+
|
+------------------+------------------+
| | |
+---+---+ +---+---+ +---+---+
| PoP | | PoP | | PoP |
| US | | EU | | Asia |
+---+---+ +---+---+ +---+---+
| | |
Users Users Users
CDN Strategy
| Content Type | CDN Strategy |
|---|---|
| Popular videos (top 20%) | Pre-pushed to edge PoPs |
| Long-tail videos | Pull-through caching |
| Live streams | Edge ingest + relay |
| Thumbnails | Aggressive edge caching, long TTL |
Cache Hit Optimization
- Popular videos: 95%+ cache hit rate at edge
- Segment-level caching (not whole video)
- Geographic routing: serve from nearest PoP
- Netflix Open Connect: custom CDN appliances at ISPs
7. Video Metadata Service
Schema
sqlvideos: video_id UUID PRIMARY KEY title VARCHAR(500) description TEXT creator_id UUID REFERENCES users upload_time TIMESTAMP duration_sec INT status ENUM('processing','ready','failed','removed') view_count BIGINT like_count BIGINT thumbnail_url VARCHAR(500) tags TEXT[] category VARCHAR(100) video_formats: video_id UUID resolution VARCHAR(10) codec VARCHAR(20) bitrate_kbps INT s3_path VARCHAR(500) PRIMARY KEY (video_id, resolution)
Read Path
Client --> API GW --> Metadata Service --> MySQL (read replica)
|
Cache (Redis)
- Cache video metadata in Redis (TTL 1 hour)
- Read replicas for high read throughput
- Shard by
video_idfor horizontal scaling
8. Thumbnail Generation
Transcoding Pipeline
|
v
Extract frames at: 25%, 50%, 75% of video
|
v
Run ML model to select "most interesting" frame
|
v
Generate multiple sizes:
- 120x90 (small, search results)
- 320x180 (medium, mobile)
- 480x360 (large, desktop)
|
v
Store in S3, serve via CDN
- Creators can also upload custom thumbnails
- A/B test thumbnails for click-through rate optimization
9. Recommendation System (High-Level)
+-------------------+
| User Activity |
| (watch history, |
| likes, search) |
+--------+----------+
|
v
+--------+----------+
| Candidate |
| Generation |
| (collaborative |
| filtering, |
| content-based) |
+--------+----------+
|
v (1000s of candidates)
+--------+----------+
| Ranking Model |
| (deep learning, |
| predict P(watch),|
| P(complete), |
| P(like)) |
+--------+----------+
|
v (top 50)
+--------+----------+
| Re-ranking |
| (diversity, |
| freshness, |
| business rules) |
+--------+----------+
|
v
Final recommendations
Two-Tower Model (YouTube)
User Tower Video Tower
(user features) (video features)
| |
v v
Embedding Embedding
| |
+------> Dot Product <--+
|
v
Relevance Score
10. Video Search
Architecture
Video Upload --> Metadata extraction --> Elasticsearch Index
|
User Query --> Query Parser --> ES Query --> Ranked Results
Index Fields
| Field | Weight | Type |
|---|---|---|
| title | High | text, analyzed |
| description | Medium | text, analyzed |
| tags | Medium | keyword |
| creator name | Low | text |
| auto-captions | Low | text, analyzed |
Search Ranking Signals
- Text relevance (BM25 score)
- View count (popularity)
- Recency (freshness boost)
- Creator authority
- User personalization
11. View Counting at Scale
Challenge
5B views/day. Simple UPDATE count = count + 1 per view won't scale.
Solution: Multi-Stage Aggregation
Client View Event
|
v
Edge Counter (in-memory, per server)
|
Flush every 5 seconds
|
v
Kafka (view events stream)
|
v
Stream Processor (Flink/Spark Streaming)
- Deduplicate (user_id + video_id + time window)
- Aggregate counts
|
v
Counter DB (Redis for real-time, Cassandra for persistent)
|
v
Video Metadata (periodic batch update of view_count)
Deduplication
- Same user watching same video within 30 seconds = 1 view
- Use sliding window with user_id + video_id hash
- Bloom filter for approximate dedup at edge
Interview Tip: Mention the trade-off between accuracy and speed. YouTube shows approximate counts ("1.2M views") updated every few hours, not real-time exact counts.
12. Copyright Detection (Content ID)
Uploaded Video --> Audio/Video Fingerprinting
|
v
Compare against reference DB
(millions of copyrighted works)
|
+------+------+
| |
Match No Match
| |
+------+------+ v
| Policy | Publish
| Lookup |
+------+------+
|
+---------+---------+
| | |
Block Monetize Track
video (ads to (analytics
rights for rights
holder) holder)
Fingerprinting Techniques
- Audio: Spectral analysis, similar to Shazam
- Video: Perceptual hashing of keyframes
- Reference database: Rights holders upload reference content
- Matching: Compare fingerprints with approximate nearest neighbor search
13. Live Streaming Basics
Broadcaster Origin Edge Servers Viewers
| | | |
|-- RTMP/SRT stream --->| | |
| |-- Transcode ------->| |
| | (real-time ABR) | |
| | | |
| |-- HLS/DASH segments>| |
| | (2-6 sec chunks) | |
| | |-- Serve ------>|
| | | to viewers |
Key Differences from VOD
| Aspect | VOD | Live |
|---|---|---|
| Latency tolerance | Minutes | 2-30 seconds |
| Transcoding | Offline, parallel | Real-time, sequential |
| CDN caching | Heavily cached | Short-lived segments |
| Error handling | Retry from any point | Move forward |
Low-Latency Live
- LL-HLS / LL-DASH: Sub-3-second latency
- WebRTC: Sub-1-second (for interactive, < 1000 viewers)
- Chunked transfer encoding for partial segments
14. Complete System Diagram
+----------+ +----------+
| Creator | | Viewer |
| (Upload) | | (Watch) |
+----+-----+ +----+-----+
| |
v v
+----+----------+ +---------+--------+
| Upload Service| | CDN Network |
| (resumable, | | (CloudFront/ |
| chunked) | | Akamai/OC) |
+----+----------+ +---------+--------+
| ^
v |
+----+----------+ +---------+--------+
| Object Store +------------------->| Origin Shield |
| (S3/GCS) | processed video +------------------+
| raw/ + proc/ |
+----+----------+
^
|
+----+------------------+
| Transcoding Pipeline |
| (Kubernetes workers) |
| - Split into segments |
| - Multi-resolution |
| - Thumbnail gen |
| - Content ID check |
+----+------------------+
|
| job queue
+----+----------+
| Kafka |
+----+----------+
|
+----+----------+-----+--------+--------+
| | | | |
v v v v v
+------+ +-----+-+ +--+---+ +-+-----+ +---+------+
|Meta | |View | |Search| |Reco | |Notif |
|Svc | |Counter| |Svc | |Svc | |Svc |
+--+---+ +--+----+ +--+---+ +--+----+ +----------+
| | | |
+--+---+ +--+-----+ +-+------+ +--+----+
|MySQL | |Redis/ | |Elastic | |ML |
|+Redis| |Cassand.| |Search | |Models |
+------+ +--------+ +--------+ +-------+
15. Interview Tips
- Split upload and stream: These are fundamentally different subsystems
- Transcoding is the bottleneck: Explain the pipeline clearly
- ABR is essential: Show you understand adaptive bitrate with segments
- CDN is where the magic happens: 95%+ of traffic is served from edge
- View counting: Never suggest a single counter with
UPDATE +1 - Content ID: Mention copyright detection even briefly -- it shows awareness
- Don't design the recommendation ML model: Just show the pipeline stages
- Storage costs dominate: Mention that YouTube stores exabytes
16. Resources
- Alex Xu - System Design Interview Vol. 1, Chapter 14: Design YouTube
- Netflix Tech Blog: "Completing the Netflix Cloud Migration"
- YouTube Engineering Blog: "YouTube's Video Transcoding at Scale"
- Netflix Open Connect: CDN architecture whitepaper
- "How Video Streaming Works" - Mux engineering blog
- DASH Industry Forum: dash.akamaized.net
- Martin Kleppmann - Designing Data-Intensive Applications, Chapter 10 (Batch Processing)
Previous: 32 - Design News Feed | Next: 34 - Design Google Search