Consistent Hashing & Data Partitioning
Why This Matters
Consistent hashing is the backbone of distributed caching (Memcached), distributed databases (Cassandra, DynamoDB), and CDN routing. Asked directly and used indirectly in many FAANG designs.
The Problem with Simple Hashing
Naive Approach
server = hash(key) % N (N = number of servers)
Works fine until you add or remove a server:
- N changes → almost all keys remap → massive cache misses
- Example: 4 servers → 5 servers → ~80% of keys move
This is unacceptable for distributed caches or databases at scale.
Consistent Hashing
Concept
Imagine a ring (hash space 0 to 2^32-1):
Server A (hash = 100)
/
0 -------- 2^32
| |
| Ring |
| |
----------
/ \
Server C Server B
(hash = 700) (hash = 400)
Placement rules:
- Hash each server to a position on the ring
- Hash each key to a position on the ring
- Walk clockwise from key's position → first server you hit owns that key
Adding a server: Only keys between the new server and its predecessor move. ~K/N keys affected (where K = total keys, N = servers).
Removing a server: Only that server's keys move to its clockwise successor.
Why This Is Better
| Scenario | Simple Hash | Consistent Hash |
|---|---|---|
| Add 1 server (5→6) | ~83% keys move | ~17% keys move |
| Remove 1 server (5→4) | ~80% keys move | ~20% keys move |
Virtual Nodes (VNodes)
Problem with Basic Consistent Hashing
With few physical servers, data distribution can be very uneven. One server might own a huge arc of the ring.
Solution: Virtual Nodes
Each physical server gets multiple positions on the ring:
Physical Server A → Virtual nodes: A1, A2, A3, A4, A5, ...
Physical Server B → Virtual nodes: B1, B2, B3, B4, B5, ...
Benefits:
- More even distribution of keys
- Heterogeneous servers: powerful server gets more vnodes
- When a server goes down, its load spreads across many servers (not just one)
Typical: 100-200 virtual nodes per physical server
Consistent Hashing in Real Systems
Amazon DynamoDB
- Consistent hashing for partition key distribution
- Virtual nodes for even distribution
- Replication to N clockwise successors on the ring
Apache Cassandra
- Token ring with consistent hashing
- Each node owns a range of tokens
- Virtual nodes (vnodes) enabled by default (256 per node)
- Data replicated to next N-1 nodes on ring
Memcached (Client-Side)
- Client libraries (libketama) implement consistent hashing
- Client decides which server to hit
- Adding/removing servers: minimal cache invalidation
Content Delivery Networks
- Hash request URL → determine which edge server caches it
- Consistent hashing ensures same URL hits same edge node
Data Partitioning Strategies
1. Key-Range Partitioning
Partition 1: keys A-G
Partition 2: keys H-N
Partition 3: keys O-Z
- Efficient range scans
- Risk of hot partitions (e.g., all new users start with 'Z')
- Used by: HBase, Google Bigtable
2. Hash Partitioning
Partition = hash(key) % num_partitions
- Even distribution
- No efficient range scans
- Used by: Cassandra, DynamoDB, Redis Cluster
3. Compound Partitioning
Partition key: hash(user_id) → determines partition
Sort key: timestamp → ordering within partition
- Combines hash distribution with range queries within a partition
- Used by: DynamoDB, Cassandra (partition key + clustering key)
Rebalancing Strategies
Fixed Number of Partitions
- Create way more partitions than nodes (e.g., 1000 partitions, 10 nodes)
- Each node owns ~100 partitions
- Adding a node: steal partitions from existing nodes
- Used by: Riak, Elasticsearch, CockroachDB
Dynamic Partitioning
- Start with few partitions
- Split when partition grows too large
- Merge when partition shrinks
- Used by: HBase, MongoDB
Consistent Hashing with VNodes
- Virtual nodes automatically distribute as described above
- Used by: Cassandra, DynamoDB
Request Routing
How does a client know which partition (node) holds a key?
1. Client-Side Routing
Client → Partition Map (cached) → Correct Node
- Client has partition metadata
- No extra hop, but client must be aware of topology
- Used by: Cassandra drivers, Redis Cluster
2. Routing Tier (Proxy)
Client → Router/Proxy → Correct Node
- Centralized routing logic
- Client is simple but extra network hop
- Used by: MongoDB (mongos), many managed services
3. Any-Node Routing (Gossip)
Client → Any Node → Forwards to Correct Node
- Any node can accept request and forward
- Used by: Cassandra (gossip protocol for topology)
Hotspot Mitigation
Even with consistent hashing, hotspots happen (celebrity posts, viral content).
Strategies:
- Add random suffix to hot key — spread across multiple partitions
key = "celebrity_123" + "_" + random(0, 9) → reads must query 10 keys and merge - Application-level caching — cache hot items in Redis/Memcached
- Read replicas — replicate hot partitions
- Split hot partition — break into sub-partitions
Resources
- 📖 DDIA Chapter 6: Partitioning
- 📖 "System Design Interview" by Alex Xu — Consistent Hashing chapter
- 🔗 Consistent Hashing — original paper by Karger et al.
- 🔗 DynamoDB paper
- 🎥 Gaurav Sen — Consistent Hashing
Previous: 08 - Message Queues & Event Streaming | Next: 10 - CAP Theorem & Consistency Models