42 - Design Payment System
Previous: 41 - Design Ticket Booking System | Next: 43 - CRDT & Conflict-Free Replication
1. Problem Statement
Design a payment system that processes charges, refunds, and multi-currency transactions reliably. Every dollar must be accounted for, no transaction lost, no double charge. Think Stripe's backend, Shopify Payments, or any fintech payment platform.
2. Requirements
Functional
| Requirement | Detail |
|---|---|
| Process payments | Authorize, capture, void |
| Refunds | Full and partial refunds |
| Multi-currency | Accept and settle in different currencies |
| Idempotency | Retry-safe: same request never charges twice |
| Reconciliation | Match internal records with PSP records daily |
| Webhook handling | Receive async status updates from PSPs |
| Ledger | Double-entry bookkeeping for every transaction |
Non-Functional
| Requirement | Target |
|---|---|
| Availability | 99.99% |
| Consistency | Strong (money cannot be lost or duplicated) |
| Latency | < 2s for payment authorization |
| Auditability | Full audit trail, immutable records |
| Compliance | PCI DSS Level 1 |
3. Payment Flow
Customer places order
|
v
+----+--------+
| Order | 1. Create order, calculate total
| Service | 2. Send payment request to Payment Service
+----+--------+
|
v
+----+--------+
| Payment | 3. Create payment intent (idempotent)
| Service | 4. Call PSP to authorize
+----+--------+ 5. Record result in ledger
|
v
+----+--------+
| PSP | 6. Validate card, check funds, authorize hold
| (Stripe) | 7. Return auth token
+----+--------+
|
v
Payment Service
| 8. On success: capture (or defer capture for later)
| 9. On failure: return error to Order Service
v
+----+--------+
| Ledger | 10. Record debit + credit entries
| Service | 11. Immutable append-only log
+-------------+
Authorization vs Capture
Two-phase payment:
AUTHORIZE (hold funds on card)
|
|--- hold may last 3-7 days
|
CAPTURE (actually charge the held funds)
|
v
Settlement (PSP transfers money to merchant)
Why two phases?
- E-commerce: authorize at checkout, capture at shipment
- Hotels: authorize at booking, capture at checkout
- Prevents charging for items you cannot fulfill
4. Payment State Machine
+----------+
| CREATED |
+----+-----+
|
authorize()
|
+----v-----+
+--->| AUTHORIZED|<-----+
| +----+-----+ |
| | |
void() capture() timeout
| | |
+----v--+ +---v------+ +---v------+
| VOIDED| | CAPTURED | | EXPIRED |
+-------+ +----+-----+ +----------+
|
refund()
|
+-----v-----+
| REFUNDED |
| (full or |
| partial) |
+-----------+
| State | Description |
|---|---|
| CREATED | Payment intent created, not yet sent to PSP |
| AUTHORIZED | Funds held on customer's card |
| CAPTURED | Funds charged, settlement pending |
| VOIDED | Authorization released before capture |
| EXPIRED | Authorization expired (not captured in time) |
| REFUNDED | Funds returned to customer |
5. Idempotency (Critical for Payments)
The most important concept in payment system design. Network failures, retries, and timeouts can cause duplicate requests.
Without idempotency:
Client -> Payment Service: "charge $50" [timeout, no response]
Client -> Payment Service: "charge $50" [retry]
Result: Customer charged $100!
With idempotency:
Client -> Payment Service: "charge $50", Idempotency-Key: "abc-123" [timeout]
Client -> Payment Service: "charge $50", Idempotency-Key: "abc-123" [retry]
Result: Customer charged $50 (second request returns cached result)
Implementation
process_payment(idempotency_key, amount, currency):
existing = db.find_by_idempotency_key(idempotency_key)
if existing:
return existing.result # return cached result, no re-processing
lock = acquire_lock(idempotency_key) # prevent concurrent duplicates
try:
result = call_psp(amount, currency)
db.store(idempotency_key, result) # persist for future lookups
return result
finally:
release_lock(lock)
Idempotency Key Storage
sqlCREATE TABLE idempotency_keys ( key TEXT PRIMARY KEY, request_hash TEXT NOT NULL, -- hash of request body (detect different requests with same key) response JSONB, status TEXT, -- PROCESSING, COMPLETED, FAILED created_at TIMESTAMPTZ DEFAULT NOW(), expires_at TIMESTAMPTZ -- clean up after 24-72 hours );
Interview Tip
Stress that idempotency is non-negotiable for payments. Mention that Stripe's API requires an
Idempotency-Keyheader for all POST requests. The key must be generated client-side (typically a UUID).
6. Double-Entry Ledger
Every financial transaction creates exactly two entries: a debit and a credit. The sum of all debits must equal the sum of all credits.
Transaction: Customer pays $100 for order
Debit: Customer Account -$100
Credit: Merchant Account +$100
Transaction: Refund $30
Debit: Merchant Account -$30
Credit: Customer Account +$30
Ledger Table:
+----+----------------+----------+--------+--------+------------+
| ID | transaction_id | account | debit | credit | timestamp |
+----+----------------+----------+--------+--------+------------+
| 1 | txn_001 | customer | 100.00 | 0.00 | 2024-01-15 |
| 2 | txn_001 | merchant | 0.00 | 100.00 | 2024-01-15 |
| 3 | txn_002 | merchant | 30.00 | 0.00 | 2024-01-16 |
| 4 | txn_002 | customer | 0.00 | 30.00 | 2024-01-16 |
+----+----------------+----------+--------+--------+------------+
Invariant: SUM(debit) = SUM(credit) -- ALWAYS
Why Double-Entry?
- Auditability -- Every dollar movement is traceable
- Error detection -- Imbalance indicates a bug
- Regulatory compliance -- Required for financial systems
- Reconciliation -- Easy to match against PSP records
Interview Tip
The ledger is append-only. Never update or delete entries. Corrections are made by adding reversal entries. This makes the system auditable and compliant.
7. Reconciliation
Match internal ledger records against PSP settlement reports to catch discrepancies.
Daily Reconciliation Process:
+------------------+ +------------------+
| Internal Ledger | | PSP Settlement |
| (our records) | | Report (Stripe) |
+--------+---------+ +--------+---------+
| |
+----------+ +------------+
| |
+------v--v------+
| Reconciliation |
| Engine |
+-------+--------+
|
+-----------+-----------+
| | |
+-----v----+ +----v-----+ +--v---------+
| Matched | | Missing | | Discrepant |
| (OK) | | in PSP | | (amount |
| | | or ours | | differs) |
+----------+ +----------+ +------------+
| |
investigate & resolve
| Status | Meaning | Action |
|---|---|---|
| Matched | Both records agree | No action |
| Missing internal | PSP has record, we don't | Investigate: did we lose a webhook? |
| Missing PSP | We have record, PSP doesn't | Investigate: was auth actually processed? |
| Amount mismatch | Amounts differ | Investigate: currency conversion? partial capture? |
8. Full System Architecture
+------------------+
| Client / App |
+--------+---------+
|
+--------v---------+
| API Gateway |
| (auth, rate |
| limit, TLS) |
+--------+---------+
|
+-----------------------+-----------------------+
| | |
+--------v-------+ +---------v-------+ +---------v-------+
| Order Service | | Payment Service | | Refund Service |
| - create order | | - payment intent| | - process refund|
| - order status | | - idempotency | | - partial refund|
+--------+-------+ | - PSP routing | +---------+-------+
| | - retry logic | |
| +--------+--------+ |
| | |
| +-------------+-------------+ |
| | | |
| +------v-------+ +--------v------+ |
| | PSP Adapter | | PSP Adapter | |
| | (Stripe) | | (PayPal) | |
| +------+-------+ +--------+------+ |
| | | |
| +-------------+-------------+ |
| | |
| +--------v--------+ |
| | Ledger Service | |
| | (double-entry, | |
| | append-only) | |
| +--------+--------+ |
| | |
+--------v-----------------------v------------------------v--+
| PostgreSQL |
| - orders, payments, ledger_entries, idempotency_keys |
+-------------------------------------------------------------+
+-------------------+ +-------------------+
| Webhook Handler | | Reconciliation |
| (async PSP events)| | Engine (daily) |
+-------------------+ +-------------------+
+-------------------+ +-------------------+
| Fraud Detection | | Notification Svc |
| (rules + ML) | | (email receipts) |
+-------------------+ +-------------------+
9. Handling Failures
Payment systems must handle every failure mode gracefully. Money is at stake.
Retry Strategy
retry_payment(request, max_retries=3):
for attempt in 1..max_retries:
try:
response = call_psp(request)
if response.status == "success":
return response
if response.status == "declined":
return response # don't retry declines
if response.status == "error":
wait(exponential_backoff(attempt))
continue # retry on transient errors
catch NetworkTimeout:
# CRITICAL: Don't know if PSP processed it
# Must check status before retrying
status = poll_psp_status(request.idempotency_key)
if status == "completed":
return status
wait(exponential_backoff(attempt))
return FAILED # all retries exhausted, manual investigation needed
Critical Failure Scenarios
| Scenario | Danger | Solution |
|---|---|---|
| Timeout after PSP call | Don't know if charged | Poll PSP status, use idempotency key |
| DB write fails after PSP success | Money charged but no record | Retry DB write; dead-letter queue for manual fix |
| PSP down | Can't process payments | Circuit breaker, fallback to secondary PSP |
| Webhook never arrives | Status stuck as "pending" | Scheduled polling job checks pending payments |
| Double webhook delivery | Process refund twice | Idempotent webhook processing |
10. Webhook Handling
PSPs send asynchronous status updates via webhooks.
PSP (Stripe) -> POST /webhooks/stripe
{
"type": "payment_intent.succeeded",
"data": { "id": "pi_123", "amount": 5000, ... }
}
Webhook Processing:
1. Verify signature (HMAC with shared secret)
2. Check idempotency (have we already processed this event ID?)
3. Update internal payment status
4. Update ledger entries
5. Trigger downstream actions (send receipt, fulfill order)
6. Return 200 OK to PSP (must respond quickly, < 5s)
If processing takes longer:
- ACK immediately (200 OK)
- Queue event for async processing
- PSP won't retry if it receives 200
11. Fraud Detection
| Layer | Technique | Example |
|---|---|---|
| Rule-based | Hard rules that block suspicious transactions | Amount > $10K, velocity > 5 txn/min, mismatched country |
| Risk scoring | ML model assigns risk score 0-100 | Score > 80 -> block, 50-80 -> 3DS challenge, < 50 -> allow |
| 3D Secure | Redirect to card issuer for additional verification | Shifts liability to issuer |
| Address Verification | Match billing address with card issuer records | AVS check |
| Device fingerprinting | Track device characteristics | Known fraudster device |
12. PCI DSS Compliance
Payment Card Industry Data Security Standard. Required for any system handling card data.
Strategy: Minimize PCI scope by NEVER touching raw card numbers.
Client Browser
|
v
+------+-------+
| Stripe.js / | Card number entered in Stripe's iframe
| PSP SDK | (never touches your servers)
+------+-------+
|
v
+------+-------+
| PSP (Stripe) | Returns a token (tok_abc123)
+------+-------+
|
v
+------+-------+
| Your Payment | Only sees the token, never the card number
| Service | Charges using token via PSP API
+--------------+
PCI scope: SAQ-A (minimal) instead of SAQ-D (full audit)
| Compliance Level | Card Data Handling | Effort |
|---|---|---|
| SAQ-A | PSP handles all card data (tokens only) | Low (self-assessment) |
| SAQ-A-EP | Card data goes through your infrastructure | Medium |
| SAQ-D | You store/process card data directly | High (full audit, very expensive) |
13. Currency Conversion
Multi-Currency Flow:
Customer pays in EUR -> PSP converts -> Merchant receives USD
Exchange Rate Handling:
1. Display price in customer's currency (using cached rate)
2. At payment time: lock rate for the transaction
3. PSP performs actual conversion at settlement
4. Record both currencies in ledger:
Debit: customer_eur EUR 85.00
Credit: merchant_usd USD 100.00
Metadata: rate = 1.1765, rate_source = "ECB", rate_locked_at = "2024-01-15T10:00:00Z"
| Approach | When Rate Locked | Risk |
|---|---|---|
| At display time | User sees price -> that's what they pay | Merchant absorbs rate fluctuation |
| At authorization | Rate locked when card authorized | Small discrepancy from displayed price |
| At settlement | PSP's rate at settlement (days later) | Unpredictable for merchant |
14. Database Schema (Simplified)
sql-- Payments CREATE TABLE payments ( payment_id UUID PRIMARY KEY, order_id UUID NOT NULL, idempotency_key TEXT UNIQUE NOT NULL, amount DECIMAL(12,2) NOT NULL, currency CHAR(3) NOT NULL, status TEXT NOT NULL DEFAULT 'CREATED', psp_name TEXT NOT NULL, -- 'stripe', 'paypal' psp_payment_id TEXT, -- external reference created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); -- Ledger entries (append-only, never update/delete) CREATE TABLE ledger_entries ( entry_id BIGSERIAL PRIMARY KEY, transaction_id UUID NOT NULL, account_id TEXT NOT NULL, entry_type TEXT NOT NULL, -- 'DEBIT' or 'CREDIT' amount DECIMAL(12,2) NOT NULL, currency CHAR(3) NOT NULL, description TEXT, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Ensure double-entry balance -- For every transaction_id: SUM(DEBIT) must equal SUM(CREDIT) -- Webhook events (for idempotent processing) CREATE TABLE webhook_events ( event_id TEXT PRIMARY KEY, -- PSP's event ID psp_name TEXT NOT NULL, event_type TEXT NOT NULL, payload JSONB NOT NULL, processed BOOLEAN DEFAULT FALSE, processed_at TIMESTAMPTZ, received_at TIMESTAMPTZ DEFAULT NOW() );
15. Key Trade-offs Discussion
| Decision | Option A | Option B |
|---|---|---|
| PSP strategy | Single PSP (simpler) | Multi-PSP with routing (resilience, cost optimization) |
| Capture timing | Immediate capture (simpler) | Delayed capture (better for fulfillment-based businesses) |
| Ledger storage | Single DB (simpler, ACID) | Event-sourced (immutable log, replay-friendly) |
| Currency conversion | Lock at display (better UX) | Lock at settlement (PSP default) |
| Fraud detection | Rules only (predictable) | ML scoring (more accurate, more complex) |
| Webhook processing | Synchronous (simpler) | Async queue (more reliable, handles spikes) |
16. Interview Checklist
- Explained the full payment flow: order -> intent -> authorize -> capture
- Stressed idempotency as the most critical safety mechanism
- Designed double-entry ledger with append-only entries
- Covered reconciliation process (internal vs PSP records)
- Failure handling for every stage (timeout, PSP down, DB failure)
- Webhook handling with signature verification and idempotent processing
- PCI DSS compliance via tokenization (never touch raw card data)
- Payment state machine with all transitions
- Multi-currency with rate locking strategy
- Mentioned fraud detection layers
17. Resources
- System Design Interview (Alex Xu, Vol 2) -- Chapter: Payment System
- Stripe Documentation -- stripe.com/docs (gold standard for payment API design)
- Paper: "Designing Reliable Payment Systems" (ACM Queue)
- YouTube: System Design Interview -- Design Payment System
- Book: "Payment Systems in the U.S." by Carol Coye Benson
- PCI DSS Official Documentation -- pcisecuritystandards.org
Previous: 41 - Design Ticket Booking System | Next: 43 - CRDT & Conflict-Free Replication