33 - Design YouTube or Netflix

Series: System Design & Distributed Systems Previous: 32 - Design News Feed | Next: 34 - Design Google Search

1. Requirements

Functional Requirements

Feature	Details
Upload video	Creators upload videos of varying length/quality
Stream video	Users watch videos with adaptive quality
Search	Find videos by title, description, tags
Recommendations	Personalized video suggestions
Comments & likes	Engagement features
View count	Accurate at-scale counting
Thumbnails	Auto-generated and custom thumbnails

Non-Functional Requirements

Availability: 99.99% for streaming, 99.9% for upload
Latency: Video playback starts in < 2s
Scale: 2B MAU, 500 hours of video uploaded per minute (YouTube scale)
Durability: Zero data loss for uploaded content
Global: Low-latency streaming worldwide via CDN

2. Capacity Estimation

DAU:                    800M (viewing), 5M (uploading)
Videos watched/day:     5B views/day
Average video length:   5 min
Average video size:     500MB (after transcoding to multiple resolutions)
Upload volume:          500 hrs/min = 30K hrs/day
Storage for uploads:    30K hrs x 60 min x 500MB/5min = 180TB/day raw
Streaming bandwidth:    5B views x 5 min x 5Mbps = ~3PB/day outbound

Resource	Estimate
Storage growth/day	~180TB (multi-resolution)
Peak streaming QPS	~100K concurrent streams
CDN bandwidth	~3PB/day

3. High-Level Architecture

+----------+     +----------+     +----------+
| Creator  |     | Viewer   |     | Viewer   |
| (Upload) |     | (Stream) |     | (Search) |
+----+-----+     +----+-----+     +----+-----+
     |                |                |
     v                v                v
+---------+    +------+------+   +----+------+
| Upload  |    |   CDN       |   | API       |
| Service |    | (CloudFront |   | Gateway   |
|         |    |  Akamai)    |   |           |
+----+----+    +------+------+   +---+---+---+
     |                |             |   |   |
     v                |        +----+   |   +----+
+----+--------+       |        |        |        |
| Transcoding |       |   +----+---+ +--+-----+ +--+------+
| Pipeline    |       |   |Metadata| |Search  | |Recommend|
| (Workers)   |       |   |Service | |Service | |Service  |
+----+--------+       |   +---+----+ +---+----+ +---------+
     |                |       |          |
     v                |  +----+----+ +---+--------+
+----+--------+       |  |Video   | |Elasticsearch|
| Object      |<------+  |Meta DB | +-------------+
| Storage     |           |(MySQL) |
| (S3/GCS)    |           +--------+
+-------------+

4. Video Upload Pipeline

Creator                Upload Service          Transcoding Pipeline
  |                         |                        |
  |-- Upload video -------->|                        |
  |   (resumable, chunked)  |                        |
  |                         |-- Store original ----->|
  |                         |   to S3 (raw/)         |
  |                         |                        |
  |                         |-- Enqueue job -------->|
  |                         |   to Kafka             |
  |<-- Upload ACK ----------|                        |
  |   (processing status)   |                        |
  |                         |                 +------+------+
  |                         |                 | Split video  |
  |                         |                 | into segments|
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Transcode    |
  |                         |                 | each segment |
  |                         |                 | to multiple  |
  |                         |                 | resolutions  |
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Generate     |
  |                         |                 | thumbnails   |
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Store to S3  |
  |                         |                 | (processed/) |
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Update DB:   |
  |                         |                 | status=ready |
  |                         |                 +-------------+
  |                         |
  |<-- Notification: video is live! ----------|

Resumable Upload

Use tus protocol or Google's resumable upload API
Split large files into 5MB chunks
Track upload progress server-side
Resume from last successful chunk on failure

5. Video Transcoding

Why Transcode?

Different devices need different resolutions and codecs
Bandwidth varies: mobile 3G vs desktop fiber
Storage optimization: efficient codecs reduce CDN costs

Transcoding Output Matrix

Resolution	Bitrate	Codec	Use Case
240p	400 kbps	H.264	Slow mobile connections
360p	800 kbps	H.264	Mobile data saver
480p	1.5 Mbps	H.264	Standard mobile
720p	3 Mbps	H.264/H.265	Tablet, laptop
1080p	6 Mbps	H.265	Desktop, smart TV
4K	15 Mbps	H.265/AV1	Large screens

Adaptive Bitrate Streaming (ABR)

Master Playlist (HLS .m3u8)
  |
  +-- 240p playlist --> segment_001.ts, segment_002.ts, ...
  +-- 480p playlist --> segment_001.ts, segment_002.ts, ...
  +-- 720p playlist --> segment_001.ts, segment_002.ts, ...
  +-- 1080p playlist -> segment_001.ts, segment_002.ts, ...

HLS (HTTP Live Streaming): Apple standard, widely supported DASH (Dynamic Adaptive Streaming over HTTP): Open standard

How ABR Works

Client monitors bandwidth
       |
       v
  Bandwidth = 5 Mbps --> Request 720p segments
       |
  Bandwidth drops to 1 Mbps
       |
       v
  Switch to 360p segments (no rebuffer)
       |
  Bandwidth recovers to 8 Mbps
       |
       v
  Switch to 1080p segments

Interview Tip: Mention that videos are split into 2-10 second segments. The client can switch quality between segments, enabling seamless quality adaptation.

6. CDN for Video Delivery

Multi-Tier CDN Architecture

Origin (S3)
    |
    v
+---+----------+
| Origin Shield |  (intermediate cache, reduces origin load)
+---+----------+
    |
    +------------------+------------------+
    |                  |                  |
+---+---+         +---+---+         +---+---+
|  PoP  |         |  PoP  |         |  PoP  |
|  US   |         |  EU   |         |  Asia |
+---+---+         +---+---+         +---+---+
    |                  |                  |
  Users              Users             Users

CDN Strategy

Content Type	CDN Strategy
Popular videos (top 20%)	Pre-pushed to edge PoPs
Long-tail videos	Pull-through caching
Live streams	Edge ingest + relay
Thumbnails	Aggressive edge caching, long TTL

Cache Hit Optimization

Popular videos: 95%+ cache hit rate at edge
Segment-level caching (not whole video)
Geographic routing: serve from nearest PoP
Netflix Open Connect: custom CDN appliances at ISPs

7. Video Metadata Service

Schema

sql
videos:
  video_id        UUID PRIMARY KEY
  title           VARCHAR(500)
  description     TEXT
  creator_id      UUID REFERENCES users
  upload_time     TIMESTAMP
  duration_sec    INT
  status          ENUM('processing','ready','failed','removed')
  view_count      BIGINT
  like_count      BIGINT
  thumbnail_url   VARCHAR(500)
  tags            TEXT[]
  category        VARCHAR(100)

video_formats:
  video_id        UUID
  resolution      VARCHAR(10)
  codec           VARCHAR(20)
  bitrate_kbps    INT
  s3_path         VARCHAR(500)
  PRIMARY KEY (video_id, resolution)

Read Path

Client --> API GW --> Metadata Service --> MySQL (read replica)
                                              |
                                         Cache (Redis)

Cache video metadata in Redis (TTL 1 hour)
Read replicas for high read throughput
Shard by video_id for horizontal scaling

8. Thumbnail Generation

Transcoding Pipeline
       |
       v
Extract frames at: 25%, 50%, 75% of video
       |
       v
Run ML model to select "most interesting" frame
       |
       v
Generate multiple sizes:
  - 120x90   (small, search results)
  - 320x180  (medium, mobile)
  - 480x360  (large, desktop)
       |
       v
Store in S3, serve via CDN

Creators can also upload custom thumbnails
A/B test thumbnails for click-through rate optimization

9. Recommendation System (High-Level)

+-------------------+
| User Activity     |
| (watch history,   |
|  likes, search)   |
+--------+----------+
         |
         v
+--------+----------+
| Candidate         |
| Generation        |
| (collaborative    |
|  filtering,       |
|  content-based)   |
+--------+----------+
         |
         v (1000s of candidates)
+--------+----------+
| Ranking Model     |
| (deep learning,   |
|  predict P(watch),|
|  P(complete),     |
|  P(like))         |
+--------+----------+
         |
         v (top 50)
+--------+----------+
| Re-ranking        |
| (diversity,       |
|  freshness,       |
|  business rules)  |
+--------+----------+
         |
         v
  Final recommendations

Two-Tower Model (YouTube)

User Tower              Video Tower
(user features)         (video features)
     |                       |
     v                       v
  Embedding              Embedding
     |                       |
     +------> Dot Product <--+
                  |
                  v
            Relevance Score

10. Video Search

Architecture

Video Upload --> Metadata extraction --> Elasticsearch Index
                                              |
User Query --> Query Parser --> ES Query --> Ranked Results

Index Fields

Field	Weight	Type
title	High	text, analyzed
description	Medium	text, analyzed
tags	Medium	keyword
creator name	Low	text
auto-captions	Low	text, analyzed

Search Ranking Signals

Text relevance (BM25 score)
View count (popularity)
Recency (freshness boost)
Creator authority
User personalization

11. View Counting at Scale

Challenge

5B views/day. Simple UPDATE count = count + 1 per view won't scale.

Solution: Multi-Stage Aggregation

Client View Event
       |
       v
Edge Counter (in-memory, per server)
       |
  Flush every 5 seconds
       |
       v
Kafka (view events stream)
       |
       v
Stream Processor (Flink/Spark Streaming)
  - Deduplicate (user_id + video_id + time window)
  - Aggregate counts
       |
       v
Counter DB (Redis for real-time, Cassandra for persistent)
       |
       v
Video Metadata (periodic batch update of view_count)

Deduplication

Same user watching same video within 30 seconds = 1 view
Use sliding window with user_id + video_id hash
Bloom filter for approximate dedup at edge

Interview Tip: Mention the trade-off between accuracy and speed. YouTube shows approximate counts ("1.2M views") updated every few hours, not real-time exact counts.

12. Copyright Detection (Content ID)

Uploaded Video --> Audio/Video Fingerprinting
                         |
                         v
               Compare against reference DB
               (millions of copyrighted works)
                         |
                  +------+------+
                  |             |
               Match         No Match
                  |             |
           +------+------+     v
           | Policy      |   Publish
           | Lookup      |
           +------+------+
                  |
        +---------+---------+
        |         |         |
      Block    Monetize   Track
      video    (ads to    (analytics
               rights     for rights
               holder)    holder)

Fingerprinting Techniques

Audio: Spectral analysis, similar to Shazam
Video: Perceptual hashing of keyframes
Reference database: Rights holders upload reference content
Matching: Compare fingerprints with approximate nearest neighbor search

13. Live Streaming Basics

Broadcaster               Origin             Edge Servers        Viewers
    |                       |                     |                |
    |-- RTMP/SRT stream --->|                     |                |
    |                       |-- Transcode ------->|                |
    |                       |   (real-time ABR)   |                |
    |                       |                     |                |
    |                       |-- HLS/DASH segments>|                |
    |                       |   (2-6 sec chunks)  |                |
    |                       |                     |-- Serve ------>|
    |                       |                     |   to viewers   |

Key Differences from VOD

Aspect	VOD	Live
Latency tolerance	Minutes	2-30 seconds
Transcoding	Offline, parallel	Real-time, sequential
CDN caching	Heavily cached	Short-lived segments
Error handling	Retry from any point	Move forward

Low-Latency Live

LL-HLS / LL-DASH: Sub-3-second latency
WebRTC: Sub-1-second (for interactive, < 1000 viewers)
Chunked transfer encoding for partial segments

14. Complete System Diagram

+----------+                              +----------+
| Creator  |                              | Viewer   |
| (Upload) |                              | (Watch)  |
+----+-----+                              +----+-----+
     |                                         |
     v                                         v
+----+----------+                    +---------+--------+
| Upload Service|                    |   CDN Network    |
| (resumable,   |                    |  (CloudFront/    |
|  chunked)     |                    |   Akamai/OC)     |
+----+----------+                    +---------+--------+
     |                                         ^
     v                                         |
+----+----------+                    +---------+--------+
| Object Store  +------------------->| Origin Shield    |
| (S3/GCS)      |  processed video   +------------------+
| raw/ + proc/  |
+----+----------+
     ^
     |
+----+------------------+
| Transcoding Pipeline  |
| (Kubernetes workers)  |
| - Split into segments |
| - Multi-resolution    |
| - Thumbnail gen       |
| - Content ID check    |
+----+------------------+
     |
     | job queue
+----+----------+
| Kafka         |
+----+----------+
     |
+----+----------+-----+--------+--------+
|               |      |        |        |
v               v      v        v        v
+------+  +-----+-+ +--+---+ +-+-----+ +---+------+
|Meta  |  |View   | |Search| |Reco   | |Notif     |
|Svc   |  |Counter| |Svc   | |Svc    | |Svc       |
+--+---+  +--+----+ +--+---+ +--+----+ +----------+
   |         |          |        |
+--+---+  +--+-----+ +-+------+ +--+----+
|MySQL |  |Redis/  | |Elastic | |ML     |
|+Redis|  |Cassand.| |Search  | |Models |
+------+  +--------+ +--------+ +-------+

15. Interview Tips

Split upload and stream: These are fundamentally different subsystems
Transcoding is the bottleneck: Explain the pipeline clearly
ABR is essential: Show you understand adaptive bitrate with segments
CDN is where the magic happens: 95%+ of traffic is served from edge
View counting: Never suggest a single counter with UPDATE +1
Content ID: Mention copyright detection even briefly -- it shows awareness
Don't design the recommendation ML model: Just show the pipeline stages
Storage costs dominate: Mention that YouTube stores exabytes

16. Resources

Alex Xu - System Design Interview Vol. 1, Chapter 14: Design YouTube
Netflix Tech Blog: "Completing the Netflix Cloud Migration"
YouTube Engineering Blog: "YouTube's Video Transcoding at Scale"
Netflix Open Connect: CDN architecture whitepaper
"How Video Streaming Works" - Mux engineering blog
DASH Industry Forum: dash.akamaized.net
Martin Kleppmann - Designing Data-Intensive Applications, Chapter 10 (Batch Processing)

Previous: 32 - Design News Feed | Next: 34 - Design Google Search