33 - Design YouTube or Netflix

Series: System Design & Distributed Systems Previous: 32 - Design News Feed | Next: 34 - Design Google Search


1. Requirements

Functional Requirements

FeatureDetails
Upload videoCreators upload videos of varying length/quality
Stream videoUsers watch videos with adaptive quality
SearchFind videos by title, description, tags
RecommendationsPersonalized video suggestions
Comments & likesEngagement features
View countAccurate at-scale counting
ThumbnailsAuto-generated and custom thumbnails

Non-Functional Requirements

  • Availability: 99.99% for streaming, 99.9% for upload
  • Latency: Video playback starts in < 2s
  • Scale: 2B MAU, 500 hours of video uploaded per minute (YouTube scale)
  • Durability: Zero data loss for uploaded content
  • Global: Low-latency streaming worldwide via CDN

2. Capacity Estimation

DAU:                    800M (viewing), 5M (uploading)
Videos watched/day:     5B views/day
Average video length:   5 min
Average video size:     500MB (after transcoding to multiple resolutions)
Upload volume:          500 hrs/min = 30K hrs/day
Storage for uploads:    30K hrs x 60 min x 500MB/5min = 180TB/day raw
Streaming bandwidth:    5B views x 5 min x 5Mbps = ~3PB/day outbound
ResourceEstimate
Storage growth/day~180TB (multi-resolution)
Peak streaming QPS~100K concurrent streams
CDN bandwidth~3PB/day

3. High-Level Architecture

+----------+     +----------+     +----------+
| Creator  |     | Viewer   |     | Viewer   |
| (Upload) |     | (Stream) |     | (Search) |
+----+-----+     +----+-----+     +----+-----+
     |                |                |
     v                v                v
+---------+    +------+------+   +----+------+
| Upload  |    |   CDN       |   | API       |
| Service |    | (CloudFront |   | Gateway   |
|         |    |  Akamai)    |   |           |
+----+----+    +------+------+   +---+---+---+
     |                |             |   |   |
     v                |        +----+   |   +----+
+----+--------+       |        |        |        |
| Transcoding |       |   +----+---+ +--+-----+ +--+------+
| Pipeline    |       |   |Metadata| |Search  | |Recommend|
| (Workers)   |       |   |Service | |Service | |Service  |
+----+--------+       |   +---+----+ +---+----+ +---------+
     |                |       |          |
     v                |  +----+----+ +---+--------+
+----+--------+       |  |Video   | |Elasticsearch|
| Object      |<------+  |Meta DB | +-------------+
| Storage     |           |(MySQL) |
| (S3/GCS)    |           +--------+
+-------------+

4. Video Upload Pipeline

Creator                Upload Service          Transcoding Pipeline
  |                         |                        |
  |-- Upload video -------->|                        |
  |   (resumable, chunked)  |                        |
  |                         |-- Store original ----->|
  |                         |   to S3 (raw/)         |
  |                         |                        |
  |                         |-- Enqueue job -------->|
  |                         |   to Kafka             |
  |<-- Upload ACK ----------|                        |
  |   (processing status)   |                        |
  |                         |                 +------+------+
  |                         |                 | Split video  |
  |                         |                 | into segments|
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Transcode    |
  |                         |                 | each segment |
  |                         |                 | to multiple  |
  |                         |                 | resolutions  |
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Generate     |
  |                         |                 | thumbnails   |
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Store to S3  |
  |                         |                 | (processed/) |
  |                         |                 +------+------+
  |                         |                        |
  |                         |                 +------+------+
  |                         |                 | Update DB:   |
  |                         |                 | status=ready |
  |                         |                 +-------------+
  |                         |
  |<-- Notification: video is live! ----------|

Resumable Upload

  • Use tus protocol or Google's resumable upload API
  • Split large files into 5MB chunks
  • Track upload progress server-side
  • Resume from last successful chunk on failure

5. Video Transcoding

Why Transcode?

  • Different devices need different resolutions and codecs
  • Bandwidth varies: mobile 3G vs desktop fiber
  • Storage optimization: efficient codecs reduce CDN costs

Transcoding Output Matrix

ResolutionBitrateCodecUse Case
240p400 kbpsH.264Slow mobile connections
360p800 kbpsH.264Mobile data saver
480p1.5 MbpsH.264Standard mobile
720p3 MbpsH.264/H.265Tablet, laptop
1080p6 MbpsH.265Desktop, smart TV
4K15 MbpsH.265/AV1Large screens

Adaptive Bitrate Streaming (ABR)

Master Playlist (HLS .m3u8)
  |
  +-- 240p playlist --> segment_001.ts, segment_002.ts, ...
  +-- 480p playlist --> segment_001.ts, segment_002.ts, ...
  +-- 720p playlist --> segment_001.ts, segment_002.ts, ...
  +-- 1080p playlist -> segment_001.ts, segment_002.ts, ...

HLS (HTTP Live Streaming): Apple standard, widely supported DASH (Dynamic Adaptive Streaming over HTTP): Open standard

How ABR Works

Client monitors bandwidth
       |
       v
  Bandwidth = 5 Mbps --> Request 720p segments
       |
  Bandwidth drops to 1 Mbps
       |
       v
  Switch to 360p segments (no rebuffer)
       |
  Bandwidth recovers to 8 Mbps
       |
       v
  Switch to 1080p segments

Interview Tip: Mention that videos are split into 2-10 second segments. The client can switch quality between segments, enabling seamless quality adaptation.


6. CDN for Video Delivery

Multi-Tier CDN Architecture

Origin (S3)
    |
    v
+---+----------+
| Origin Shield |  (intermediate cache, reduces origin load)
+---+----------+
    |
    +------------------+------------------+
    |                  |                  |
+---+---+         +---+---+         +---+---+
|  PoP  |         |  PoP  |         |  PoP  |
|  US   |         |  EU   |         |  Asia |
+---+---+         +---+---+         +---+---+
    |                  |                  |
  Users              Users             Users

CDN Strategy

Content TypeCDN Strategy
Popular videos (top 20%)Pre-pushed to edge PoPs
Long-tail videosPull-through caching
Live streamsEdge ingest + relay
ThumbnailsAggressive edge caching, long TTL

Cache Hit Optimization

  • Popular videos: 95%+ cache hit rate at edge
  • Segment-level caching (not whole video)
  • Geographic routing: serve from nearest PoP
  • Netflix Open Connect: custom CDN appliances at ISPs

7. Video Metadata Service

Schema

sql
videos: video_id UUID PRIMARY KEY title VARCHAR(500) description TEXT creator_id UUID REFERENCES users upload_time TIMESTAMP duration_sec INT status ENUM('processing','ready','failed','removed') view_count BIGINT like_count BIGINT thumbnail_url VARCHAR(500) tags TEXT[] category VARCHAR(100) video_formats: video_id UUID resolution VARCHAR(10) codec VARCHAR(20) bitrate_kbps INT s3_path VARCHAR(500) PRIMARY KEY (video_id, resolution)

Read Path

Client --> API GW --> Metadata Service --> MySQL (read replica)
                                              |
                                         Cache (Redis)
  • Cache video metadata in Redis (TTL 1 hour)
  • Read replicas for high read throughput
  • Shard by video_id for horizontal scaling

8. Thumbnail Generation

Transcoding Pipeline
       |
       v
Extract frames at: 25%, 50%, 75% of video
       |
       v
Run ML model to select "most interesting" frame
       |
       v
Generate multiple sizes:
  - 120x90   (small, search results)
  - 320x180  (medium, mobile)
  - 480x360  (large, desktop)
       |
       v
Store in S3, serve via CDN
  • Creators can also upload custom thumbnails
  • A/B test thumbnails for click-through rate optimization

9. Recommendation System (High-Level)

+-------------------+
| User Activity     |
| (watch history,   |
|  likes, search)   |
+--------+----------+
         |
         v
+--------+----------+
| Candidate         |
| Generation        |
| (collaborative    |
|  filtering,       |
|  content-based)   |
+--------+----------+
         |
         v (1000s of candidates)
+--------+----------+
| Ranking Model     |
| (deep learning,   |
|  predict P(watch),|
|  P(complete),     |
|  P(like))         |
+--------+----------+
         |
         v (top 50)
+--------+----------+
| Re-ranking        |
| (diversity,       |
|  freshness,       |
|  business rules)  |
+--------+----------+
         |
         v
  Final recommendations

Two-Tower Model (YouTube)

User Tower              Video Tower
(user features)         (video features)
     |                       |
     v                       v
  Embedding              Embedding
     |                       |
     +------> Dot Product <--+
                  |
                  v
            Relevance Score

10. Video Search

Architecture

Video Upload --> Metadata extraction --> Elasticsearch Index
                                              |
User Query --> Query Parser --> ES Query --> Ranked Results

Index Fields

FieldWeightType
titleHightext, analyzed
descriptionMediumtext, analyzed
tagsMediumkeyword
creator nameLowtext
auto-captionsLowtext, analyzed

Search Ranking Signals

  • Text relevance (BM25 score)
  • View count (popularity)
  • Recency (freshness boost)
  • Creator authority
  • User personalization

11. View Counting at Scale

Challenge

5B views/day. Simple UPDATE count = count + 1 per view won't scale.

Solution: Multi-Stage Aggregation

Client View Event
       |
       v
Edge Counter (in-memory, per server)
       |
  Flush every 5 seconds
       |
       v
Kafka (view events stream)
       |
       v
Stream Processor (Flink/Spark Streaming)
  - Deduplicate (user_id + video_id + time window)
  - Aggregate counts
       |
       v
Counter DB (Redis for real-time, Cassandra for persistent)
       |
       v
Video Metadata (periodic batch update of view_count)

Deduplication

  • Same user watching same video within 30 seconds = 1 view
  • Use sliding window with user_id + video_id hash
  • Bloom filter for approximate dedup at edge

Interview Tip: Mention the trade-off between accuracy and speed. YouTube shows approximate counts ("1.2M views") updated every few hours, not real-time exact counts.


12. Copyright Detection (Content ID)

Uploaded Video --> Audio/Video Fingerprinting
                         |
                         v
               Compare against reference DB
               (millions of copyrighted works)
                         |
                  +------+------+
                  |             |
               Match         No Match
                  |             |
           +------+------+     v
           | Policy      |   Publish
           | Lookup      |
           +------+------+
                  |
        +---------+---------+
        |         |         |
      Block    Monetize   Track
      video    (ads to    (analytics
               rights     for rights
               holder)    holder)

Fingerprinting Techniques

  • Audio: Spectral analysis, similar to Shazam
  • Video: Perceptual hashing of keyframes
  • Reference database: Rights holders upload reference content
  • Matching: Compare fingerprints with approximate nearest neighbor search

13. Live Streaming Basics

Broadcaster               Origin             Edge Servers        Viewers
    |                       |                     |                |
    |-- RTMP/SRT stream --->|                     |                |
    |                       |-- Transcode ------->|                |
    |                       |   (real-time ABR)   |                |
    |                       |                     |                |
    |                       |-- HLS/DASH segments>|                |
    |                       |   (2-6 sec chunks)  |                |
    |                       |                     |-- Serve ------>|
    |                       |                     |   to viewers   |

Key Differences from VOD

AspectVODLive
Latency toleranceMinutes2-30 seconds
TranscodingOffline, parallelReal-time, sequential
CDN cachingHeavily cachedShort-lived segments
Error handlingRetry from any pointMove forward

Low-Latency Live

  • LL-HLS / LL-DASH: Sub-3-second latency
  • WebRTC: Sub-1-second (for interactive, < 1000 viewers)
  • Chunked transfer encoding for partial segments

14. Complete System Diagram

+----------+                              +----------+
| Creator  |                              | Viewer   |
| (Upload) |                              | (Watch)  |
+----+-----+                              +----+-----+
     |                                         |
     v                                         v
+----+----------+                    +---------+--------+
| Upload Service|                    |   CDN Network    |
| (resumable,   |                    |  (CloudFront/    |
|  chunked)     |                    |   Akamai/OC)     |
+----+----------+                    +---------+--------+
     |                                         ^
     v                                         |
+----+----------+                    +---------+--------+
| Object Store  +------------------->| Origin Shield    |
| (S3/GCS)      |  processed video   +------------------+
| raw/ + proc/  |
+----+----------+
     ^
     |
+----+------------------+
| Transcoding Pipeline  |
| (Kubernetes workers)  |
| - Split into segments |
| - Multi-resolution    |
| - Thumbnail gen       |
| - Content ID check    |
+----+------------------+
     |
     | job queue
+----+----------+
| Kafka         |
+----+----------+
     |
+----+----------+-----+--------+--------+
|               |      |        |        |
v               v      v        v        v
+------+  +-----+-+ +--+---+ +-+-----+ +---+------+
|Meta  |  |View   | |Search| |Reco   | |Notif     |
|Svc   |  |Counter| |Svc   | |Svc    | |Svc       |
+--+---+  +--+----+ +--+---+ +--+----+ +----------+
   |         |          |        |
+--+---+  +--+-----+ +-+------+ +--+----+
|MySQL |  |Redis/  | |Elastic | |ML     |
|+Redis|  |Cassand.| |Search  | |Models |
+------+  +--------+ +--------+ +-------+

15. Interview Tips

  1. Split upload and stream: These are fundamentally different subsystems
  2. Transcoding is the bottleneck: Explain the pipeline clearly
  3. ABR is essential: Show you understand adaptive bitrate with segments
  4. CDN is where the magic happens: 95%+ of traffic is served from edge
  5. View counting: Never suggest a single counter with UPDATE +1
  6. Content ID: Mention copyright detection even briefly -- it shows awareness
  7. Don't design the recommendation ML model: Just show the pipeline stages
  8. Storage costs dominate: Mention that YouTube stores exabytes

16. Resources

  • Alex Xu - System Design Interview Vol. 1, Chapter 14: Design YouTube
  • Netflix Tech Blog: "Completing the Netflix Cloud Migration"
  • YouTube Engineering Blog: "YouTube's Video Transcoding at Scale"
  • Netflix Open Connect: CDN architecture whitepaper
  • "How Video Streaming Works" - Mux engineering blog
  • DASH Industry Forum: dash.akamaized.net
  • Martin Kleppmann - Designing Data-Intensive Applications, Chapter 10 (Batch Processing)

Previous: 32 - Design News Feed | Next: 34 - Design Google Search