Clocks & Ordering in Distributed Systems

Why This Matters

In a distributed system, there is no global clock. "What happened first?" is surprisingly hard to answer. Understanding clocks is crucial for conflict resolution, consistency, and debugging.


The Problem with Physical Clocks

Clock Skew

  • Different machines have slightly different times
  • Even with NTP synchronization, drift of 1-10ms is common
  • Across datacenters, skew can be 100ms+

NTP (Network Time Protocol)

  • Synchronizes clocks with time servers
  • Accuracy: ~1ms within LAN, ~10-100ms across internet
  • Not reliable enough for ordering events in distributed systems

Why Physical Time Fails for Ordering

Machine A (clock slightly ahead): Event at T=100.001
Machine B (clock slightly behind): Event at T=100.000

Physical time says B happened first, but A might have actually happened first!

Last Write Wins (LWW) with physical timestamps can silently drop writes.


Logical Clocks

Lamport Clocks

Algorithm:

Each node maintains a counter C

On local event:
  C = C + 1

On sending message:
  C = C + 1
  Attach C to message

On receiving message with timestamp T:
  C = max(C, T) + 1

Properties:

  • If event A happened before B → L(A) < L(B) ✅
  • If L(A) < L(B) → A MIGHT have happened before B (or they're concurrent) ⚠️
  • Cannot distinguish concurrent events from ordered events

Use: Simple ordering where causality tracking isn't needed.

Vector Clocks

Algorithm:

N nodes, each maintains a vector of N counters: [C1, C2, ..., Cn]

Node i on local event:
  V[i] = V[i] + 1

Node i on sending message:
  V[i] = V[i] + 1
  Attach V to message

Node i on receiving message with vector T:
  For each j: V[j] = max(V[j], T[j])
  V[i] = V[i] + 1

Comparing vectors:

V1 < V2  (V1 happened before V2) if: all V1[i] ≤ V2[i] AND at least one V1[i] < V2[i]
V1 || V2 (concurrent) if: neither V1 < V2 nor V2 < V1

Example:

Node A: [2, 0, 0]  Event a2
Node B: [1, 1, 0]  Event b1 (after receiving from A)
Node C: [0, 0, 1]  Event c1

[2,0,0] vs [0,0,1] → neither ≤ → CONCURRENT (correct!)
[1,1,0] vs [2,0,0] → not ≤ either way → CONCURRENT

Properties:

  • If A → B, then V(A) < V(B) ✅
  • If V(A) < V(B), then A → B ✅ (stronger than Lamport!)
  • Can detect concurrent events ✅

Downside: Vector size grows with number of nodes. For thousands of nodes, use dotted version vectors or interval tree clocks.

Used by: Amazon DynamoDB (simplified), Riak


Hybrid Logical Clocks (HLC)

Combines physical time and logical counters:

HLC = (physical_time, logical_counter, node_id)

Properties:

  • Close to physical time (useful for humans)
  • Preserves causality (like logical clocks)
  • Bounded divergence from real time
  • Fixed size (unlike vector clocks)

Used by: CockroachDB, YugabyteDB, MongoDB


Happens-Before Relationship

Definition (Lamport, 1978)

Event A happens before event B (written A → B) if:

  1. A and B are in the same process, and A comes before B
  2. A is a send event and B is the corresponding receive
  3. Transitivity: if A → B and B → C, then A → C

If neither A → B nor B → A, events are concurrent (A ‖ B).

Causal Ordering

  • Events with causal relationship MUST be ordered
  • Concurrent events can be ordered arbitrarily
  • Causal ordering is weaker than total ordering but often sufficient

Total vs Partial Ordering

Total Order

Every pair of events is ordered:

e1 < e2 < e3 < e4 (no ties, no incomparable events)

Required for: linearizability, consensus, single-leader replication

Partial Order

Some events are incomparable (concurrent):

e1 < e3
e2 < e3
e1 ‖ e2 (concurrent, no order between them)

Sufficient for: causal consistency, CRDTs

Total Order Broadcast

All nodes deliver messages in the same total order.

  • Equivalent to consensus (Raft, Paxos)
  • Used for replicated state machines
  • Every node applies same operations in same order → same state

TrueTime (Google Spanner)

Innovation

Google's Spanner uses GPS and atomic clocks to bound clock uncertainty:

TrueTime.now() returns interval [earliest, latest]
"The real time is somewhere in this interval"
Typical uncertainty: 1-7ms

Commit Wait

Spanner commits:
1. Assign commit timestamp = TrueTime.now().latest
2. WAIT until TrueTime.now().earliest > commit timestamp
3. Now safe to commit (no future transaction can get earlier timestamp)

This gives external consistency (linearizability) across globally distributed nodes.

Why This Is Special

  • Only Google has this (requires GPS/atomic clock hardware in every datacenter)
  • Enables globally consistent reads without cross-datacenter coordination
  • CockroachDB approximates this with HLC (without special hardware)

Practical Applications

Conflict Resolution with Timestamps

Write 1: {key: "x", value: "A", timestamp: 10}
Write 2: {key: "x", value: "B", timestamp: 12}

LWW → value = "B" (higher timestamp wins)

Version Vectors for Shopping Cart (DynamoDB)

User adds item via US server:  cart: [shirt], vector: [US:1, EU:0]
User adds item via EU server:  cart: [pants], vector: [US:0, EU:1]

Conflict detected! Vectors are concurrent.
DynamoDB returns BOTH versions → application merges:
  cart: [shirt, pants], vector: [US:1, EU:1]

Snapshot Isolation

  • Assign each transaction a timestamp
  • Transaction sees all writes committed before its start time
  • Consistent point-in-time view of data
  • Used by PostgreSQL MVCC, MySQL InnoDB

Interview Tips

  • Know the difference between Lamport and vector clocks
  • Understand why physical timestamps are insufficient
  • Be able to explain happens-before relationship
  • Know that LWW can silently lose data
  • Mention HLC as practical compromise
  • Google Spanner's TrueTime is great for bonus points

Resources


Previous: 13 - Distributed Transactions | Next: 15 - Scaling Strategies