Caching
Why This Matters
Caching is one of the most impactful performance optimizations. Every FAANG interview expects you to discuss caching strategy, placement, and invalidation.
Cache at Every Layer
Client (Browser Cache)
→ CDN (Edge Cache)
→ API Gateway Cache
→ Application Cache (Redis/Memcached)
→ Database Cache (Query Cache, Buffer Pool)
→ Disk (OS Page Cache)
Caching Strategies
1. Cache-Aside (Lazy Loading) — Most Common
Read:
1. Check cache → HIT? Return cached data
2. MISS? Read from DB
3. Write result to cache
4. Return data
Write:
1. Write to DB
2. Invalidate (delete) cache entry
Pros: Only caches what's actually requested, cache failure doesn't break the system Cons: Cache miss = 3 network calls (cache check + DB read + cache write), stale data possible Use when: Read-heavy, data changes infrequently
2. Write-Through
Write:
1. Write to cache AND DB simultaneously
2. Cache always has latest data
Read:
1. Always read from cache (always fresh)
Pros: Cache is always consistent with DB Cons: Write latency increases (write to 2 places), caches data that may never be read Use when: Data must be fresh, can tolerate write latency
3. Write-Behind (Write-Back)
Write:
1. Write to cache only (return immediately)
2. Asynchronously flush to DB in batches
Read:
1. Always read from cache
Pros: Very fast writes, batch DB writes reduce load Cons: Risk of data loss if cache crashes before flush, complex implementation Use when: Write-heavy, can tolerate small data loss risk (e.g., analytics, counters)
4. Read-Through
Read:
1. Application reads from cache
2. Cache itself fetches from DB on miss
3. Cache populates itself
Pros: Application code is simpler (cache handles loading) Cons: First request always slow, cache library must support DB loading Use when: Using a cache library that supports read-through (like Caffeine, Guava)
5. Refresh-Ahead
Cache proactively refreshes entries BEFORE they expire
(predicts which entries will be needed soon)
Pros: Reduces cache miss latency Cons: Wastes resources on unused predictions Use when: Highly predictable access patterns
Cache Eviction Policies
| Policy | How It Works | Best For |
|---|---|---|
| LRU (Least Recently Used) | Evict least recently accessed item | General purpose — most common |
| LFU (Least Frequently Used) | Evict least frequently accessed item | When frequency matters more than recency |
| FIFO | Evict oldest item | Simple, queue-like |
| TTL (Time to Live) | Evict after fixed time | Time-sensitive data |
| Random | Evict random item | Surprisingly good in some workloads |
Interview default: Use LRU with TTL. Mention LFU if the access pattern has "hot" items.
Cache Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
Strategies
1. TTL-based expiration:
- Set TTL when writing to cache
- Data automatically expires
- Simple but allows stale reads until TTL
2. Event-based invalidation:
- When data changes, publish event → invalidate cache
- More consistent but more complex
- Use with message queues or CDC (Change Data Capture)
3. Write-through invalidation:
- On write: update DB → delete cache key
- Next read repopulates cache
- Why delete, not update? — race condition: stale write can overwrite fresh data
The Race Condition Problem
Thread A reads DB (value = old)
Thread B writes DB (value = new)
Thread B deletes cache
Thread A writes cache (value = old) ← STALE!
Solutions:
- Use distributed locks (expensive)
- Double-delete with delay
- Accept brief staleness (most systems do this)
Distributed Caching
Redis
- In-memory key-value store with rich data structures
- Supports: strings, hashes, lists, sets, sorted sets, streams, bitmaps
- Persistence: RDB snapshots + AOF (append-only file)
- Replication: master-replica with automatic failover (Redis Sentinel)
- Clustering: Redis Cluster for horizontal scaling (hash slots)
- Pub/Sub: built-in message broker
- Lua scripting: atomic operations
- Use cases: caching, sessions, leaderboards, rate limiting, queues
Memcached
- In-memory key-value store, simpler than Redis
- Multithreaded (Redis is single-threaded per node)
- No data structures beyond strings
- No persistence, no replication (built for pure caching)
- Use cases: simple caching where you don't need Redis features
Redis vs Memcached
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Rich (lists, sets, hashes, etc.) | Strings only |
| Persistence | Yes (RDB + AOF) | No |
| Replication | Yes (master-replica) | No |
| Clustering | Yes (Redis Cluster) | Client-side sharding |
| Pub/Sub | Yes | No |
| Lua scripting | Yes | No |
| Multi-threaded | No (single-thread per shard) | Yes |
| Memory efficiency | Less (overhead per key) | More (slab allocator) |
Interview default: "I'd use Redis" — it handles almost every caching use case.
CDN Caching
Cache-Control Headers
Cache-Control: public, max-age=31536000, immutable
→ CDN + browser can cache for 1 year
Cache-Control: private, no-cache
→ Only browser can cache, must revalidate each time
Cache-Control: no-store
→ Never cache (sensitive data)
CDN Invalidation
- Purge by URL — invalidate specific resource
- Purge by tag — invalidate all resources with a tag
- Purge all — nuclear option
- Versioned URLs —
style.abc123.css(best: no invalidation needed)
Cache Stampede (Thundering Herd)
Problem
Popular cache key expires → 1000 requests simultaneously hit DB → DB overwhelmed
Solutions
- Locking — first request acquires lock, fetches data, others wait
- Refresh-ahead — refresh before expiration
- Probabilistic early expiration — each request has small chance of refreshing before TTL
- Stale-while-revalidate — serve stale data while refreshing in background
Cache Sizing & Hit Ratio
Cache Hit Ratio
Hit Ratio = cache hits / (cache hits + cache misses)
- > 95% — excellent
- 80-95% — good
- < 80% — review caching strategy
Sizing Rules of Thumb
- Cache the hot working set — often 20% of data serves 80% of reads (Pareto)
- Start with 10-20% of dataset size
- Monitor hit ratio and adjust
- Consider memory cost vs DB load reduction
Multi-Level Caching
L1: Local in-process cache (HashMap, Caffeine, Guava)
→ ~1μs access, limited by JVM heap
L2: Distributed cache (Redis)
→ ~1ms access, shared across all app servers
L3: CDN edge cache
→ ~10ms access, geographically distributed
L4: Database buffer pool
→ ~10ms access, inside DB server
Pattern: Check L1 → miss → L2 → miss → DB → populate L2 → populate L1
Interview Tips
- Always add caching in your design (especially for read-heavy systems)
- Specify WHERE the cache sits (client, CDN, app, DB)
- Discuss the caching strategy (cache-aside is safe default)
- Mention TTL and invalidation approach
- Address cache stampede for popular items
- Calculate: if DB handles 10K QPS, cache at 95% hit ratio means only 500 QPS hit DB
Resources
- 📖 DDIA Chapter 3: Storage and Retrieval (buffer pools)
- 📖 "System Design Interview" by Alex Xu — Chapter on Cache
- 🔗 Redis documentation
- 🎥 ByteByteGo — Caching Strategies
- 🔗 Facebook's Memcache paper
- 🔗 AWS ElastiCache Best Practices
Previous: 05 - Load Balancing & Reverse Proxies | Next: 07 - Databases Deep Dive