Clocks & Ordering in Distributed Systems

Why This Matters

In a distributed system, there is no global clock. "What happened first?" is surprisingly hard to answer. Understanding clocks is crucial for conflict resolution, consistency, and debugging.

The Problem with Physical Clocks

Clock Skew

Different machines have slightly different times
Even with NTP synchronization, drift of 1-10ms is common
Across datacenters, skew can be 100ms+

NTP (Network Time Protocol)

Synchronizes clocks with time servers
Accuracy: ~1ms within LAN, ~10-100ms across internet
Not reliable enough for ordering events in distributed systems

Why Physical Time Fails for Ordering

Machine A (clock slightly ahead): Event at T=100.001
Machine B (clock slightly behind): Event at T=100.000

Physical time says B happened first, but A might have actually happened first!

Last Write Wins (LWW) with physical timestamps can silently drop writes.

Logical Clocks

Lamport Clocks

Algorithm:

Each node maintains a counter C

On local event:
  C = C + 1

On sending message:
  C = C + 1
  Attach C to message

On receiving message with timestamp T:
  C = max(C, T) + 1

Properties:

If event A happened before B → L(A) < L(B) ✅
If L(A) < L(B) → A MIGHT have happened before B (or they're concurrent) ⚠️
Cannot distinguish concurrent events from ordered events

Use: Simple ordering where causality tracking isn't needed.

Vector Clocks

Algorithm:

N nodes, each maintains a vector of N counters: [C1, C2, ..., Cn]

Node i on local event:
  V[i] = V[i] + 1

Node i on sending message:
  V[i] = V[i] + 1
  Attach V to message

Node i on receiving message with vector T:
  For each j: V[j] = max(V[j], T[j])
  V[i] = V[i] + 1

Comparing vectors:

V1 < V2  (V1 happened before V2) if: all V1[i] ≤ V2[i] AND at least one V1[i] < V2[i]
V1 || V2 (concurrent) if: neither V1 < V2 nor V2 < V1

Example:

Node A: [2, 0, 0]  Event a2
Node B: [1, 1, 0]  Event b1 (after receiving from A)
Node C: [0, 0, 1]  Event c1

[2,0,0] vs [0,0,1] → neither ≤ → CONCURRENT (correct!)
[1,1,0] vs [2,0,0] → not ≤ either way → CONCURRENT

Properties:

If A → B, then V(A) < V(B) ✅
If V(A) < V(B), then A → B ✅ (stronger than Lamport!)
Can detect concurrent events ✅

Downside: Vector size grows with number of nodes. For thousands of nodes, use dotted version vectors or interval tree clocks.

Used by: Amazon DynamoDB (simplified), Riak

Hybrid Logical Clocks (HLC)

Combines physical time and logical counters:

HLC = (physical_time, logical_counter, node_id)

Properties:

Close to physical time (useful for humans)
Preserves causality (like logical clocks)
Bounded divergence from real time
Fixed size (unlike vector clocks)

Used by: CockroachDB, YugabyteDB, MongoDB

Happens-Before Relationship

Definition (Lamport, 1978)

Event A happens before event B (written A → B) if:

A and B are in the same process, and A comes before B
A is a send event and B is the corresponding receive
Transitivity: if A → B and B → C, then A → C

If neither A → B nor B → A, events are concurrent (A ‖ B).

Causal Ordering

Events with causal relationship MUST be ordered
Concurrent events can be ordered arbitrarily
Causal ordering is weaker than total ordering but often sufficient

Total vs Partial Ordering

Total Order

Every pair of events is ordered:

e1 < e2 < e3 < e4 (no ties, no incomparable events)

Required for: linearizability, consensus, single-leader replication

Partial Order

Some events are incomparable (concurrent):

e1 < e3
e2 < e3
e1 ‖ e2 (concurrent, no order between them)

Sufficient for: causal consistency, CRDTs

Total Order Broadcast

All nodes deliver messages in the same total order.

Equivalent to consensus (Raft, Paxos)
Used for replicated state machines
Every node applies same operations in same order → same state

TrueTime (Google Spanner)

Innovation

Google's Spanner uses GPS and atomic clocks to bound clock uncertainty:

TrueTime.now() returns interval [earliest, latest]
"The real time is somewhere in this interval"
Typical uncertainty: 1-7ms

Commit Wait

Spanner commits:
1. Assign commit timestamp = TrueTime.now().latest
2. WAIT until TrueTime.now().earliest > commit timestamp
3. Now safe to commit (no future transaction can get earlier timestamp)

This gives external consistency (linearizability) across globally distributed nodes.

Why This Is Special

Only Google has this (requires GPS/atomic clock hardware in every datacenter)
Enables globally consistent reads without cross-datacenter coordination
CockroachDB approximates this with HLC (without special hardware)

Practical Applications

Conflict Resolution with Timestamps

Write 1: {key: "x", value: "A", timestamp: 10}
Write 2: {key: "x", value: "B", timestamp: 12}

LWW → value = "B" (higher timestamp wins)

Version Vectors for Shopping Cart (DynamoDB)

User adds item via US server:  cart: [shirt], vector: [US:1, EU:0]
User adds item via EU server:  cart: [pants], vector: [US:0, EU:1]

Conflict detected! Vectors are concurrent.
DynamoDB returns BOTH versions → application merges:
  cart: [shirt, pants], vector: [US:1, EU:1]

Snapshot Isolation

Assign each transaction a timestamp
Transaction sees all writes committed before its start time
Consistent point-in-time view of data
Used by PostgreSQL MVCC, MySQL InnoDB

Interview Tips

Know the difference between Lamport and vector clocks
Understand why physical timestamps are insufficient
Be able to explain happens-before relationship
Know that LWW can silently lose data
Mention HLC as practical compromise
Google Spanner's TrueTime is great for bonus points

Resources

📖 DDIA Chapter 8: The Trouble with Distributed Systems (clocks section)
📖 DDIA Chapter 9: Consistency and Consensus (ordering section)
🔗 Lamport — Time, Clocks, and the Ordering of Events
🔗 Google Spanner paper
🔗 CockroachDB — Living Without Atomic Clocks
🎥 Martin Kleppmann — Distributed Systems lectures (ordering)

Previous: 13 - Distributed Transactions | Next: 15 - Scaling Strategies