Clocks & Ordering in Distributed Systems
Why This Matters
In a distributed system, there is no global clock. "What happened first?" is surprisingly hard to answer. Understanding clocks is crucial for conflict resolution, consistency, and debugging.
The Problem with Physical Clocks
Clock Skew
- Different machines have slightly different times
- Even with NTP synchronization, drift of 1-10ms is common
- Across datacenters, skew can be 100ms+
NTP (Network Time Protocol)
- Synchronizes clocks with time servers
- Accuracy: ~1ms within LAN, ~10-100ms across internet
- Not reliable enough for ordering events in distributed systems
Why Physical Time Fails for Ordering
Machine A (clock slightly ahead): Event at T=100.001
Machine B (clock slightly behind): Event at T=100.000
Physical time says B happened first, but A might have actually happened first!
Last Write Wins (LWW) with physical timestamps can silently drop writes.
Logical Clocks
Lamport Clocks
Algorithm:
Each node maintains a counter C
On local event:
C = C + 1
On sending message:
C = C + 1
Attach C to message
On receiving message with timestamp T:
C = max(C, T) + 1
Properties:
- If event A happened before B → L(A) < L(B) ✅
- If L(A) < L(B) → A MIGHT have happened before B (or they're concurrent) ⚠️
- Cannot distinguish concurrent events from ordered events
Use: Simple ordering where causality tracking isn't needed.
Vector Clocks
Algorithm:
N nodes, each maintains a vector of N counters: [C1, C2, ..., Cn]
Node i on local event:
V[i] = V[i] + 1
Node i on sending message:
V[i] = V[i] + 1
Attach V to message
Node i on receiving message with vector T:
For each j: V[j] = max(V[j], T[j])
V[i] = V[i] + 1
Comparing vectors:
V1 < V2 (V1 happened before V2) if: all V1[i] ≤ V2[i] AND at least one V1[i] < V2[i]
V1 || V2 (concurrent) if: neither V1 < V2 nor V2 < V1
Example:
Node A: [2, 0, 0] Event a2
Node B: [1, 1, 0] Event b1 (after receiving from A)
Node C: [0, 0, 1] Event c1
[2,0,0] vs [0,0,1] → neither ≤ → CONCURRENT (correct!)
[1,1,0] vs [2,0,0] → not ≤ either way → CONCURRENT
Properties:
- If A → B, then V(A) < V(B) ✅
- If V(A) < V(B), then A → B ✅ (stronger than Lamport!)
- Can detect concurrent events ✅
Downside: Vector size grows with number of nodes. For thousands of nodes, use dotted version vectors or interval tree clocks.
Used by: Amazon DynamoDB (simplified), Riak
Hybrid Logical Clocks (HLC)
Combines physical time and logical counters:
HLC = (physical_time, logical_counter, node_id)
Properties:
- Close to physical time (useful for humans)
- Preserves causality (like logical clocks)
- Bounded divergence from real time
- Fixed size (unlike vector clocks)
Used by: CockroachDB, YugabyteDB, MongoDB
Happens-Before Relationship
Definition (Lamport, 1978)
Event A happens before event B (written A → B) if:
- A and B are in the same process, and A comes before B
- A is a send event and B is the corresponding receive
- Transitivity: if A → B and B → C, then A → C
If neither A → B nor B → A, events are concurrent (A ‖ B).
Causal Ordering
- Events with causal relationship MUST be ordered
- Concurrent events can be ordered arbitrarily
- Causal ordering is weaker than total ordering but often sufficient
Total vs Partial Ordering
Total Order
Every pair of events is ordered:
e1 < e2 < e3 < e4 (no ties, no incomparable events)
Required for: linearizability, consensus, single-leader replication
Partial Order
Some events are incomparable (concurrent):
e1 < e3
e2 < e3
e1 ‖ e2 (concurrent, no order between them)
Sufficient for: causal consistency, CRDTs
Total Order Broadcast
All nodes deliver messages in the same total order.
- Equivalent to consensus (Raft, Paxos)
- Used for replicated state machines
- Every node applies same operations in same order → same state
TrueTime (Google Spanner)
Innovation
Google's Spanner uses GPS and atomic clocks to bound clock uncertainty:
TrueTime.now() returns interval [earliest, latest]
"The real time is somewhere in this interval"
Typical uncertainty: 1-7ms
Commit Wait
Spanner commits:
1. Assign commit timestamp = TrueTime.now().latest
2. WAIT until TrueTime.now().earliest > commit timestamp
3. Now safe to commit (no future transaction can get earlier timestamp)
This gives external consistency (linearizability) across globally distributed nodes.
Why This Is Special
- Only Google has this (requires GPS/atomic clock hardware in every datacenter)
- Enables globally consistent reads without cross-datacenter coordination
- CockroachDB approximates this with HLC (without special hardware)
Practical Applications
Conflict Resolution with Timestamps
Write 1: {key: "x", value: "A", timestamp: 10}
Write 2: {key: "x", value: "B", timestamp: 12}
LWW → value = "B" (higher timestamp wins)
Version Vectors for Shopping Cart (DynamoDB)
User adds item via US server: cart: [shirt], vector: [US:1, EU:0]
User adds item via EU server: cart: [pants], vector: [US:0, EU:1]
Conflict detected! Vectors are concurrent.
DynamoDB returns BOTH versions → application merges:
cart: [shirt, pants], vector: [US:1, EU:1]
Snapshot Isolation
- Assign each transaction a timestamp
- Transaction sees all writes committed before its start time
- Consistent point-in-time view of data
- Used by PostgreSQL MVCC, MySQL InnoDB
Interview Tips
- Know the difference between Lamport and vector clocks
- Understand why physical timestamps are insufficient
- Be able to explain happens-before relationship
- Know that LWW can silently lose data
- Mention HLC as practical compromise
- Google Spanner's TrueTime is great for bonus points
Resources
- 📖 DDIA Chapter 8: The Trouble with Distributed Systems (clocks section)
- 📖 DDIA Chapter 9: Consistency and Consensus (ordering section)
- 🔗 Lamport — Time, Clocks, and the Ordering of Events
- 🔗 Google Spanner paper
- 🔗 CockroachDB — Living Without Atomic Clocks
- 🎥 Martin Kleppmann — Distributed Systems lectures (ordering)
Previous: 13 - Distributed Transactions | Next: 15 - Scaling Strategies