Consistent Hashing & Data Partitioning

Why This Matters

Consistent hashing is the backbone of distributed caching (Memcached), distributed databases (Cassandra, DynamoDB), and CDN routing. Asked directly and used indirectly in many FAANG designs.


The Problem with Simple Hashing

Naive Approach

server = hash(key) % N    (N = number of servers)

Works fine until you add or remove a server:

  • N changes → almost all keys remap → massive cache misses
  • Example: 4 servers → 5 servers → ~80% of keys move

This is unacceptable for distributed caches or databases at scale.


Consistent Hashing

Concept

Imagine a ring (hash space 0 to 2^32-1):

           Server A (hash = 100)
          /
    0 -------- 2^32
    |          |
    |  Ring    |
    |          |
    ----------
   /          \
Server C       Server B
(hash = 700)   (hash = 400)

Placement rules:

  1. Hash each server to a position on the ring
  2. Hash each key to a position on the ring
  3. Walk clockwise from key's position → first server you hit owns that key

Adding a server: Only keys between the new server and its predecessor move. ~K/N keys affected (where K = total keys, N = servers).

Removing a server: Only that server's keys move to its clockwise successor.

Why This Is Better

ScenarioSimple HashConsistent Hash
Add 1 server (5→6)~83% keys move~17% keys move
Remove 1 server (5→4)~80% keys move~20% keys move

Virtual Nodes (VNodes)

Problem with Basic Consistent Hashing

With few physical servers, data distribution can be very uneven. One server might own a huge arc of the ring.

Solution: Virtual Nodes

Each physical server gets multiple positions on the ring:

Physical Server A → Virtual nodes: A1, A2, A3, A4, A5, ...
Physical Server B → Virtual nodes: B1, B2, B3, B4, B5, ...

Benefits:

  • More even distribution of keys
  • Heterogeneous servers: powerful server gets more vnodes
  • When a server goes down, its load spreads across many servers (not just one)

Typical: 100-200 virtual nodes per physical server


Consistent Hashing in Real Systems

Amazon DynamoDB

  • Consistent hashing for partition key distribution
  • Virtual nodes for even distribution
  • Replication to N clockwise successors on the ring

Apache Cassandra

  • Token ring with consistent hashing
  • Each node owns a range of tokens
  • Virtual nodes (vnodes) enabled by default (256 per node)
  • Data replicated to next N-1 nodes on ring

Memcached (Client-Side)

  • Client libraries (libketama) implement consistent hashing
  • Client decides which server to hit
  • Adding/removing servers: minimal cache invalidation

Content Delivery Networks

  • Hash request URL → determine which edge server caches it
  • Consistent hashing ensures same URL hits same edge node

Data Partitioning Strategies

1. Key-Range Partitioning

Partition 1: keys A-G
Partition 2: keys H-N
Partition 3: keys O-Z
  • Efficient range scans
  • Risk of hot partitions (e.g., all new users start with 'Z')
  • Used by: HBase, Google Bigtable

2. Hash Partitioning

Partition = hash(key) % num_partitions
  • Even distribution
  • No efficient range scans
  • Used by: Cassandra, DynamoDB, Redis Cluster

3. Compound Partitioning

Partition key: hash(user_id)     → determines partition
Sort key: timestamp              → ordering within partition
  • Combines hash distribution with range queries within a partition
  • Used by: DynamoDB, Cassandra (partition key + clustering key)

Rebalancing Strategies

Fixed Number of Partitions

  • Create way more partitions than nodes (e.g., 1000 partitions, 10 nodes)
  • Each node owns ~100 partitions
  • Adding a node: steal partitions from existing nodes
  • Used by: Riak, Elasticsearch, CockroachDB

Dynamic Partitioning

  • Start with few partitions
  • Split when partition grows too large
  • Merge when partition shrinks
  • Used by: HBase, MongoDB

Consistent Hashing with VNodes

  • Virtual nodes automatically distribute as described above
  • Used by: Cassandra, DynamoDB

Request Routing

How does a client know which partition (node) holds a key?

1. Client-Side Routing

Client → Partition Map (cached) → Correct Node
  • Client has partition metadata
  • No extra hop, but client must be aware of topology
  • Used by: Cassandra drivers, Redis Cluster

2. Routing Tier (Proxy)

Client → Router/Proxy → Correct Node
  • Centralized routing logic
  • Client is simple but extra network hop
  • Used by: MongoDB (mongos), many managed services

3. Any-Node Routing (Gossip)

Client → Any Node → Forwards to Correct Node
  • Any node can accept request and forward
  • Used by: Cassandra (gossip protocol for topology)

Hotspot Mitigation

Even with consistent hashing, hotspots happen (celebrity posts, viral content).

Strategies:

  1. Add random suffix to hot key — spread across multiple partitions
    key = "celebrity_123" + "_" + random(0, 9)
    → reads must query 10 keys and merge
    
  2. Application-level caching — cache hot items in Redis/Memcached
  3. Read replicas — replicate hot partitions
  4. Split hot partition — break into sub-partitions

Resources


Previous: 08 - Message Queues & Event Streaming | Next: 10 - CAP Theorem & Consistency Models