Fundamentals of System Design
Why System Design Matters at FAANG
- Senior-level interviews dedicate 1-2 full rounds to system design
- Tests your ability to think at scale, make trade-offs, and communicate
- No single "correct" answer — they evaluate your thought process
The Interview Framework (4 Steps)
Step 1 — Understand the Problem & Define Scope (5 min)
- Ask clarifying questions — never jump into solutions
- Define functional requirements (what the system does)
- Define non-functional requirements (scale, latency, availability, consistency)
- Identify the users and use cases
- Estimate scale: DAU, QPS, storage, bandwidth
Example questions to ask:
- How many users? Read-heavy or write-heavy?
- What's the expected latency?
- Do we need strong consistency or is eventual OK?
- What's more important: availability or consistency?
- Any geographic distribution requirements?
Step 2 — High-Level Design (10-15 min)
- Draw the main components (clients, servers, databases, caches, queues)
- Show the data flow for primary use cases
- Identify the APIs (endpoints, request/response)
- Choose storage strategy (SQL, NoSQL, blob store)
Step 3 — Deep Dive (10-15 min)
- Pick 2-3 components the interviewer cares about
- Discuss trade-offs for each design choice
- Handle edge cases and failure scenarios
- Show scaling strategies
Step 4 — Wrap Up & Address Bottlenecks (5 min)
- Identify single points of failure
- Discuss monitoring and alerting
- Mention future improvements
- Summarize key trade-offs you made
Key Concepts You'll Use Everywhere
Back-of-the-Envelope Estimation
Powers of 2 you must know:
| Power | Exact | Approx |
|---|---|---|
| 2^10 | 1,024 | ~1 Thousand (1 KB) |
| 2^20 | 1,048,576 | ~1 Million (1 MB) |
| 2^30 | 1,073,741,824 | ~1 Billion (1 GB) |
| 2^40 | ~1 Trillion (1 TB) |
Latency numbers every engineer should know:
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory reference | 100 ns |
| SSD random read | 150 μs |
| HDD seek | 10 ms |
| Send packet CA → Netherlands → CA | 150 ms |
| Read 1 MB sequentially from memory | 250 μs |
| Read 1 MB sequentially from SSD | 1 ms |
| Read 1 MB sequentially from HDD | 20 ms |
| Round trip within same datacenter | 0.5 ms |
Quick math shortcuts:
- 1 day = 86,400 seconds ≈ ~10^5 seconds
- 1 million requests/day ≈ ~12 QPS
- 100 million requests/day ≈ ~1,200 QPS
- QPS × 2-5 = peak QPS (account for spikes)
Availability & Reliability
| Availability | Downtime/year | Downtime/month |
|---|---|---|
| 99% (two 9s) | 3.65 days | 7.3 hours |
| 99.9% (three 9s) | 8.76 hours | 43.8 min |
| 99.99% (four 9s) | 52.6 min | 4.38 min |
| 99.999% (five 9s) | 5.26 min | 26.3 sec |
Scalability Dimensions
- Read scalability — caching, replicas, CDN
- Write scalability — sharding, async processing, queues
- Storage scalability — partitioning, tiered storage, compression
- Compute scalability — horizontal scaling, load balancing
Trade-Offs You'll Always Discuss
- Consistency vs Availability (CAP)
- Latency vs Throughput
- Read optimization vs Write optimization
- Simplicity vs Flexibility
- Cost vs Performance
- Accuracy vs Speed (approximate answers)
Common Mistakes in Interviews
- Jumping into solution without asking questions
- Over-engineering — don't add complexity you can't justify
- Not discussing trade-offs — every choice has pros/cons
- Ignoring non-functional requirements — scale, latency, availability
- Monologuing — it's a conversation, not a lecture
- Not estimating — always do back-of-envelope math
- Treating it as coding — this is architecture, not implementation
Resources
- 📖 "System Design Interview" by Alex Xu — Chapter 1-3
- 📖 DDIA Chapter 1: Reliable, Scalable, Maintainable Applications
- 🔗 system-design-primer — How to approach
- 🎥 Gaurav Sen — System Design Basics
- 🔗 ByteByteGo — A framework for SD interviews
Next: 02 - Networking Essentials