43 - CRDT & Conflict-Free Replication

Previous: 42 - Design Payment System | Next: 44 - Gossip Protocols & Membership

1. Why CRDTs Matter

In distributed systems, we often want multiple nodes (or clients) to update shared state independently, without coordination, and later merge those updates into a consistent result. Traditional approaches (locking, consensus) add latency and coordination overhead. CRDTs solve this by designing data structures that merge automatically without conflicts.

Key Use Cases

Use Case	Example
Collaborative editing	Google Docs, Figma, Notion
Offline-first apps	Mobile apps syncing after reconnect
Multi-leader replication	Multi-region databases (DynamoDB, Riak)
Edge computing	CDN nodes updating counters, caches
Chat/messaging	Message ordering across devices

2. What Are CRDTs?

Conflict-free Replicated Data Types are data structures where:

Every replica can be updated independently (no coordination)
Replicas always converge to the same state after exchanging updates
Merging is associative, commutative, and idempotent

Replica A        Replica B        Replica C
  |                 |                 |
  | add("x")        | add("y")       | add("z")
  |                 |                 |
  |---sync---->     |                 |
  |     <----sync---|                 |
  |                 |---sync---->     |
  |                 |     <----sync---|
  |                 |                 |
  v                 v                 v
{x, y, z}       {x, y, z}       {x, y, z}
                ALL CONVERGE

3. Mathematical Foundation: Join Semilattice

A CRDT's merge operation forms a join semilattice: a partially ordered set where every pair of elements has a least upper bound (join/merge).

Properties of the merge function (join):

  Commutative:  merge(A, B) = merge(B, A)
  Associative:  merge(A, merge(B, C)) = merge(merge(A, B), C)
  Idempotent:   merge(A, A) = A

These guarantee:
  - Order of receiving updates doesn't matter (commutative)
  - Grouping of updates doesn't matter (associative)
  - Duplicate delivery is harmless (idempotent)

4. State-Based vs Operation-Based CRDTs

Aspect	State-Based (CvRDT)	Operation-Based (CmRDT)
What is synced	Full state snapshot	Individual operations
Merge	merge(local_state, remote_state)	apply(operation)
Network requirement	Can lose/duplicate messages (idempotent)	Exactly-once, causal delivery required
Bandwidth	Higher (send full state)	Lower (send only operations)
Implementation	Simpler	More complex (delivery guarantees)
Example	G-Counter, OR-Set	Append operation, increment operation

Interview Tip

State-based is easier to reason about and more forgiving of network issues. Operation-based is more bandwidth-efficient but requires causal delivery guarantees (harder to implement correctly).

5. CRDT Types

G-Counter (Grow-Only Counter)

Each node maintains its own counter. The global value is the sum.

Node A: [A:3, B:0, C:0]    local_value = 3
Node B: [A:0, B:5, C:0]    local_value = 5
Node C: [A:0, B:0, C:2]    local_value = 2

Merge: take max per node
Result: [A:3, B:5, C:2]    global_value = 3 + 5 + 2 = 10

increment(node_id):
  counters[node_id] += 1

merge(other):
  for each node_id:
    counters[node_id] = max(counters[node_id], other.counters[node_id])

value():
  return sum(counters.values())

Limitation: Can only grow, never decrease.

PN-Counter (Positive-Negative Counter)

Two G-Counters: one for increments, one for decrements.

P (positive): [A:5, B:3, C:1]  -> sum = 9  (total increments)
N (negative): [A:1, B:2, C:0]  -> sum = 3  (total decrements)

value() = sum(P) - sum(N) = 9 - 3 = 6

increment(node):  P[node] += 1
decrement(node):  N[node] += 1

merge(other):
  P = merge_g_counter(P, other.P)
  N = merge_g_counter(N, other.N)

G-Set (Grow-Only Set)

Elements can only be added, never removed. Merge = union.

Node A: {apple, banana}
Node B: {banana, cherry}

merge(A, B) = {apple, banana, cherry}

OR-Set (Observed-Remove Set)

Supports both add and remove. Each element has a unique tag.

add("x"):
  generate unique tag t
  store (x, t) in add_set

remove("x"):
  move all (x, *) entries from add_set to remove_set

lookup("x"):
  return exists in add_set AND not in remove_set

merge(other):
  add_set = union of both add_sets
  remove_set = union of both remove_sets
  result = add_set - remove_set

Concurrent operations:
  Node A: add("x") with tag t1
  Node B: remove("x") -- removes all known tags of "x"
  Node A: still has (x, t1) which B hasn't seen
  After merge: "x" is present (add wins over concurrent remove)

LWW-Register (Last-Writer-Wins Register)

Each write carries a timestamp. Highest timestamp wins.

Node A writes: ("hello", timestamp=100)
Node B writes: ("world", timestamp=105)

merge: value = "world" (timestamp 105 > 100)

Danger: Data loss if concurrent writes to same key
         (the "losing" write is silently discarded)

LWW-Map

A map where each key is an LWW-Register.

Node A: {"name": ("Alice", t=100), "age": ("30", t=100)}
Node B: {"name": ("Bob", t=105),   "city": ("NYC", t=102)}

merge:
  "name" -> "Bob" (t=105 > t=100, B wins)
  "age"  -> "30"  (only A has it)
  "city" -> "NYC" (only B has it)

Result: {"name": "Bob", "age": "30", "city": "NYC"}

6. Summary Table of CRDT Types

CRDT	Operations	Merge Strategy	Use Case
G-Counter	increment	max per node, sum for value	Page views, likes
PN-Counter	increment, decrement	Two G-Counters	Shopping cart quantity
G-Set	add	Union	Tag collection
OR-Set	add, remove	Tagged elements, add-wins	User's favorite list
LWW-Register	write	Highest timestamp wins	Profile fields
LWW-Map	set key, delete key	LWW per key	User preferences
MV-Register	write	Keep all concurrent values	Show conflict to user

7. Practical Examples

Google Docs / Collaborative Editing

Approach: Operation-based CRDT or OT (Google uses OT historically)

User A types "Hello" at position 0
User B types "World" at position 0 (concurrently)

CRDT approach:
  Each character has a unique position ID (fractional indexing)
  A's "Hello": positions 0.1, 0.2, 0.3, 0.4, 0.5
  B's "World": positions 0.01, 0.02, 0.03, 0.04, 0.05
  Merge: sort by position -> "WorldHello" (deterministic)

Real implementations: Yjs, Automerge (both CRDT-based)

Figma

Figma uses CRDTs for:
  - Object properties (position, size, color) -> LWW-Register per property
  - Object creation/deletion -> OR-Set of objects
  - Layer ordering -> sequence CRDT

Each property change on an object is an LWW update.
Last writer wins per-property, not per-object (fine-grained).

Redis CRDT (Redis Enterprise)

Redis Enterprise supports CRDT-enabled data types for active-active geo-replication:
  - Counter: PN-Counter
  - Set: OR-Set
  - String: LWW-Register
  - Sorted Set: custom CRDT merge

Multi-region Redis clusters converge automatically.

8. CRDTs vs Operational Transformation (OT)

Aspect	CRDT	OT
Central server	Not required	Usually requires central server
Peer-to-peer	Natural fit	Difficult
Offline support	Excellent	Limited
Complexity	Data structure design	Transform function pairs
Memory overhead	Higher (metadata per element)	Lower
Convergence guarantee	Mathematical	Correctness proofs are notoriously hard
Adoption	Yjs, Automerge, Figma	Google Docs, many legacy systems
Undo/redo	Complex	More natural

Interview Tip

If asked "how would you build collaborative editing?", mentioning CRDTs shows depth. Note that Google Docs historically uses OT, but newer systems (Figma, many startups) use CRDTs. Yjs and Automerge are the two major open-source CRDT libraries.

9. Trade-offs and Challenges

Challenge	Detail
Memory overhead	CRDTs store metadata (tags, timestamps, tombstones) per element. An OR-Set's tombstone set can grow unbounded.
Garbage collection	Tombstones must eventually be pruned, but safe pruning requires knowing all replicas have seen the delete.
Convergence delay	Replicas converge only after syncing. During partition, replicas diverge.
Semantic conflicts	CRDTs resolve syntactic conflicts (merge is well-defined), but semantic conflicts may still occur (e.g., two users edit the same sentence).
Limited operations	Not all data structures have known CRDT designs. Complex application logic may not map cleanly to CRDTs.
Debugging	Merged state can be surprising. "Why did my delete not take effect?" (concurrent add-wins in OR-Set).

Mitigations

Memory: Use causal stability to garbage-collect tombstones
        (an event is causally stable when all replicas have seen it)

Semantic: Surface conflicts to the user (like Git merge conflicts)
          or use domain-specific merge strategies

Debugging: Log all operations with vector clocks for replay/audit

10. When to Use CRDTs

Good fit:
  [x] Multiple writers need to update independently
  [x] Network partitions or offline operation expected
  [x] Eventual consistency is acceptable
  [x] Merge semantics can be defined for the data type
  [x] Low coordination overhead is a priority

Poor fit:
  [ ] Strong consistency required (use consensus instead)
  [ ] Complex transactional logic across multiple objects
  [ ] Data types don't have natural merge semantics
  [ ] Memory overhead is unacceptable

11. Implementing a Simple G-Counter (Pseudocode)

class GCounter:
  node_id: string
  counts: Map<string, int>  # node_id -> count

  constructor(node_id):
    this.node_id = node_id
    this.counts = {}

  increment():
    this.counts[this.node_id] = this.counts.get(this.node_id, 0) + 1

  value():
    return sum(this.counts.values())

  merge(other: GCounter):
    all_nodes = union(this.counts.keys(), other.counts.keys())
    merged = new GCounter(this.node_id)
    for node in all_nodes:
      merged.counts[node] = max(
        this.counts.get(node, 0),
        other.counts.get(node, 0)
      )
    return merged

12. Interview Checklist

Explained why CRDTs exist (coordination-free updates, offline-first, multi-leader)
Defined CRDTs: data structures that merge automatically without conflicts
Described mathematical foundation (join semilattice: commutative, associative, idempotent)
Distinguished state-based vs operation-based
Walked through at least 3 CRDT types with examples (G-Counter, OR-Set, LWW-Register)
Gave practical examples (Google Docs, Figma, Redis CRDT)
Discussed trade-offs (memory, tombstones, semantic conflicts)
Compared CRDTs vs OT

13. Resources

Paper: "A Comprehensive Study of CRDTs" -- Shapiro et al., INRIA (2011)
Paper: "Conflict-Free Replicated Data Types" -- Shapiro et al. (original CRDT paper)
Martin Kleppmann talks -- "CRDTs: The Hard Parts" (Hydra Conference), "Automerge" series
Book: "Designing Data-Intensive Applications" (Kleppmann) -- Chapter 5, Leaderless Replication
Yjs documentation -- yjs.dev (production CRDT library)
Automerge documentation -- automerge.org
YouTube: Martin Kleppmann -- "CRDTs and the Quest for Distributed Consistency"
Blog: Lars Hupel -- "An Introduction to CRDTs"

Previous: 42 - Design Payment System | Next: 44 - Gossip Protocols & Membership