Consistent Hashing & Data Partitioning

Why This Matters

Consistent hashing is the backbone of distributed caching (Memcached), distributed databases (Cassandra, DynamoDB), and CDN routing. Asked directly and used indirectly in many FAANG designs.

The Problem with Simple Hashing

Naive Approach

server = hash(key) % N    (N = number of servers)

Works fine until you add or remove a server:

N changes → almost all keys remap → massive cache misses
Example: 4 servers → 5 servers → ~80% of keys move

This is unacceptable for distributed caches or databases at scale.

Consistent Hashing

Concept

Imagine a ring (hash space 0 to 2^32-1):

           Server A (hash = 100)
          /
    0 -------- 2^32
    |          |
    |  Ring    |
    |          |
    ----------
   /          \
Server C       Server B
(hash = 700)   (hash = 400)

Placement rules:

Hash each server to a position on the ring
Hash each key to a position on the ring
Walk clockwise from key's position → first server you hit owns that key

Adding a server: Only keys between the new server and its predecessor move. ~K/N keys affected (where K = total keys, N = servers).

Removing a server: Only that server's keys move to its clockwise successor.

Why This Is Better

Scenario	Simple Hash	Consistent Hash
Add 1 server (5→6)	~83% keys move	~17% keys move
Remove 1 server (5→4)	~80% keys move	~20% keys move

Virtual Nodes (VNodes)

Problem with Basic Consistent Hashing

With few physical servers, data distribution can be very uneven. One server might own a huge arc of the ring.

Solution: Virtual Nodes

Each physical server gets multiple positions on the ring:

Physical Server A → Virtual nodes: A1, A2, A3, A4, A5, ...
Physical Server B → Virtual nodes: B1, B2, B3, B4, B5, ...

Benefits:

More even distribution of keys
Heterogeneous servers: powerful server gets more vnodes
When a server goes down, its load spreads across many servers (not just one)

Typical: 100-200 virtual nodes per physical server

Consistent Hashing in Real Systems

Amazon DynamoDB

Consistent hashing for partition key distribution
Virtual nodes for even distribution
Replication to N clockwise successors on the ring

Apache Cassandra

Token ring with consistent hashing
Each node owns a range of tokens
Virtual nodes (vnodes) enabled by default (256 per node)
Data replicated to next N-1 nodes on ring

Memcached (Client-Side)

Client libraries (libketama) implement consistent hashing
Client decides which server to hit
Adding/removing servers: minimal cache invalidation

Content Delivery Networks

Hash request URL → determine which edge server caches it
Consistent hashing ensures same URL hits same edge node

Data Partitioning Strategies

1. Key-Range Partitioning

Partition 1: keys A-G
Partition 2: keys H-N
Partition 3: keys O-Z

Efficient range scans
Risk of hot partitions (e.g., all new users start with 'Z')
Used by: HBase, Google Bigtable

2. Hash Partitioning

Partition = hash(key) % num_partitions

Even distribution
No efficient range scans
Used by: Cassandra, DynamoDB, Redis Cluster

3. Compound Partitioning

Partition key: hash(user_id)     → determines partition
Sort key: timestamp              → ordering within partition

Combines hash distribution with range queries within a partition
Used by: DynamoDB, Cassandra (partition key + clustering key)

Rebalancing Strategies

Fixed Number of Partitions

Create way more partitions than nodes (e.g., 1000 partitions, 10 nodes)
Each node owns ~100 partitions
Adding a node: steal partitions from existing nodes
Used by: Riak, Elasticsearch, CockroachDB

Dynamic Partitioning

Start with few partitions
Split when partition grows too large
Merge when partition shrinks
Used by: HBase, MongoDB

Consistent Hashing with VNodes

Virtual nodes automatically distribute as described above
Used by: Cassandra, DynamoDB

Request Routing

How does a client know which partition (node) holds a key?

1. Client-Side Routing

Client → Partition Map (cached) → Correct Node

Client has partition metadata
No extra hop, but client must be aware of topology
Used by: Cassandra drivers, Redis Cluster

2. Routing Tier (Proxy)

Client → Router/Proxy → Correct Node

Centralized routing logic
Client is simple but extra network hop
Used by: MongoDB (mongos), many managed services

3. Any-Node Routing (Gossip)

Client → Any Node → Forwards to Correct Node

Any node can accept request and forward
Used by: Cassandra (gossip protocol for topology)

Hotspot Mitigation

Even with consistent hashing, hotspots happen (celebrity posts, viral content).

Strategies:

Add random suffix to hot key — spread across multiple partitions

key = "celebrity_123" + "_" + random(0, 9)
→ reads must query 10 keys and merge

Application-level caching — cache hot items in Redis/Memcached
Read replicas — replicate hot partitions
Split hot partition — break into sub-partitions

Resources

📖 DDIA Chapter 6: Partitioning
📖 "System Design Interview" by Alex Xu — Consistent Hashing chapter
🔗 Consistent Hashing — original paper by Karger et al.
🔗 DynamoDB paper
🎥 Gaurav Sen — Consistent Hashing

Previous: 08 - Message Queues & Event Streaming | Next: 10 - CAP Theorem & Consistency Models