42 - Design Payment System

Previous: 41 - Design Ticket Booking System | Next: 43 - CRDT & Conflict-Free Replication


1. Problem Statement

Design a payment system that processes charges, refunds, and multi-currency transactions reliably. Every dollar must be accounted for, no transaction lost, no double charge. Think Stripe's backend, Shopify Payments, or any fintech payment platform.


2. Requirements

Functional

RequirementDetail
Process paymentsAuthorize, capture, void
RefundsFull and partial refunds
Multi-currencyAccept and settle in different currencies
IdempotencyRetry-safe: same request never charges twice
ReconciliationMatch internal records with PSP records daily
Webhook handlingReceive async status updates from PSPs
LedgerDouble-entry bookkeeping for every transaction

Non-Functional

RequirementTarget
Availability99.99%
ConsistencyStrong (money cannot be lost or duplicated)
Latency< 2s for payment authorization
AuditabilityFull audit trail, immutable records
CompliancePCI DSS Level 1

3. Payment Flow

  Customer places order
       |
       v
  +----+--------+
  | Order       |  1. Create order, calculate total
  | Service     |  2. Send payment request to Payment Service
  +----+--------+
       |
       v
  +----+--------+
  | Payment     |  3. Create payment intent (idempotent)
  | Service     |  4. Call PSP to authorize
  +----+--------+  5. Record result in ledger
       |
       v
  +----+--------+
  | PSP         |  6. Validate card, check funds, authorize hold
  | (Stripe)    |  7. Return auth token
  +----+--------+
       |
       v
  Payment Service
       |  8. On success: capture (or defer capture for later)
       |  9. On failure: return error to Order Service
       v
  +----+--------+
  | Ledger      |  10. Record debit + credit entries
  | Service     |  11. Immutable append-only log
  +-------------+

Authorization vs Capture

Two-phase payment:

  AUTHORIZE (hold funds on card)
       |
       |--- hold may last 3-7 days
       |
  CAPTURE (actually charge the held funds)
       |
       v
  Settlement (PSP transfers money to merchant)

Why two phases?
  - E-commerce: authorize at checkout, capture at shipment
  - Hotels: authorize at booking, capture at checkout
  - Prevents charging for items you cannot fulfill

4. Payment State Machine

                    +----------+
                    |  CREATED |
                    +----+-----+
                         |
                    authorize()
                         |
                    +----v-----+
               +--->| AUTHORIZED|<-----+
               |    +----+-----+       |
               |         |             |
          void()    capture()     timeout
               |         |             |
          +----v--+  +---v------+  +---v------+
          | VOIDED|  | CAPTURED |  | EXPIRED  |
          +-------+  +----+-----+  +----------+
                          |
                     refund()
                          |
                    +-----v-----+
                    | REFUNDED  |
                    | (full or  |
                    |  partial) |
                    +-----------+
StateDescription
CREATEDPayment intent created, not yet sent to PSP
AUTHORIZEDFunds held on customer's card
CAPTUREDFunds charged, settlement pending
VOIDEDAuthorization released before capture
EXPIREDAuthorization expired (not captured in time)
REFUNDEDFunds returned to customer

5. Idempotency (Critical for Payments)

The most important concept in payment system design. Network failures, retries, and timeouts can cause duplicate requests.

Without idempotency:

  Client -> Payment Service: "charge $50"      [timeout, no response]
  Client -> Payment Service: "charge $50"      [retry]
  Result: Customer charged $100!

With idempotency:

  Client -> Payment Service: "charge $50", Idempotency-Key: "abc-123"  [timeout]
  Client -> Payment Service: "charge $50", Idempotency-Key: "abc-123"  [retry]
  Result: Customer charged $50 (second request returns cached result)

Implementation

process_payment(idempotency_key, amount, currency):
  existing = db.find_by_idempotency_key(idempotency_key)
  if existing:
    return existing.result           # return cached result, no re-processing

  lock = acquire_lock(idempotency_key)  # prevent concurrent duplicates
  try:
    result = call_psp(amount, currency)
    db.store(idempotency_key, result)   # persist for future lookups
    return result
  finally:
    release_lock(lock)

Idempotency Key Storage

sql
CREATE TABLE idempotency_keys ( key TEXT PRIMARY KEY, request_hash TEXT NOT NULL, -- hash of request body (detect different requests with same key) response JSONB, status TEXT, -- PROCESSING, COMPLETED, FAILED created_at TIMESTAMPTZ DEFAULT NOW(), expires_at TIMESTAMPTZ -- clean up after 24-72 hours );

Interview Tip

Stress that idempotency is non-negotiable for payments. Mention that Stripe's API requires an Idempotency-Key header for all POST requests. The key must be generated client-side (typically a UUID).


6. Double-Entry Ledger

Every financial transaction creates exactly two entries: a debit and a credit. The sum of all debits must equal the sum of all credits.

Transaction: Customer pays $100 for order

  Debit:  Customer Account     -$100
  Credit: Merchant Account     +$100

Transaction: Refund $30

  Debit:  Merchant Account     -$30
  Credit: Customer Account     +$30

Ledger Table:
+----+----------------+----------+--------+--------+------------+
| ID | transaction_id | account  | debit  | credit | timestamp  |
+----+----------------+----------+--------+--------+------------+
|  1 | txn_001        | customer | 100.00 |   0.00 | 2024-01-15 |
|  2 | txn_001        | merchant |   0.00 | 100.00 | 2024-01-15 |
|  3 | txn_002        | merchant |  30.00 |   0.00 | 2024-01-16 |
|  4 | txn_002        | customer |   0.00 |  30.00 | 2024-01-16 |
+----+----------------+----------+--------+--------+------------+

Invariant: SUM(debit) = SUM(credit) -- ALWAYS

Why Double-Entry?

  • Auditability -- Every dollar movement is traceable
  • Error detection -- Imbalance indicates a bug
  • Regulatory compliance -- Required for financial systems
  • Reconciliation -- Easy to match against PSP records

Interview Tip

The ledger is append-only. Never update or delete entries. Corrections are made by adding reversal entries. This makes the system auditable and compliant.


7. Reconciliation

Match internal ledger records against PSP settlement reports to catch discrepancies.

Daily Reconciliation Process:

  +------------------+        +------------------+
  | Internal Ledger  |        | PSP Settlement   |
  | (our records)    |        | Report (Stripe)  |
  +--------+---------+        +--------+---------+
           |                           |
           +----------+  +------------+
                      |  |
               +------v--v------+
               | Reconciliation |
               | Engine         |
               +-------+--------+
                       |
           +-----------+-----------+
           |           |           |
     +-----v----+ +----v-----+ +--v---------+
     | Matched  | | Missing  | | Discrepant |
     | (OK)     | | in PSP   | | (amount    |
     |          | | or ours  | |  differs)  |
     +----------+ +----------+ +------------+
                       |           |
                  investigate & resolve
StatusMeaningAction
MatchedBoth records agreeNo action
Missing internalPSP has record, we don'tInvestigate: did we lose a webhook?
Missing PSPWe have record, PSP doesn'tInvestigate: was auth actually processed?
Amount mismatchAmounts differInvestigate: currency conversion? partial capture?

8. Full System Architecture

                        +------------------+
                        |  Client / App    |
                        +--------+---------+
                                 |
                        +--------v---------+
                        |  API Gateway     |
                        |  (auth, rate     |
                        |   limit, TLS)    |
                        +--------+---------+
                                 |
         +-----------------------+-----------------------+
         |                       |                       |
+--------v-------+     +---------v-------+     +---------v-------+
| Order Service  |     | Payment Service |     | Refund Service  |
| - create order |     | - payment intent|     | - process refund|
| - order status |     | - idempotency   |     | - partial refund|
+--------+-------+     | - PSP routing   |     +---------+-------+
         |              | - retry logic   |               |
         |              +--------+--------+               |
         |                       |                        |
         |         +-------------+-------------+          |
         |         |                           |          |
         |  +------v-------+          +--------v------+   |
         |  | PSP Adapter  |          | PSP Adapter   |   |
         |  | (Stripe)     |          | (PayPal)      |   |
         |  +------+-------+          +--------+------+   |
         |         |                           |          |
         |         +-------------+-------------+          |
         |                       |                        |
         |              +--------v--------+               |
         |              | Ledger Service  |               |
         |              | (double-entry,  |               |
         |              |  append-only)   |               |
         |              +--------+--------+               |
         |                       |                        |
+--------v-----------------------v------------------------v--+
|                     PostgreSQL                              |
|  - orders, payments, ledger_entries, idempotency_keys       |
+-------------------------------------------------------------+

         +-------------------+        +-------------------+
         | Webhook Handler   |        | Reconciliation    |
         | (async PSP events)|        | Engine (daily)    |
         +-------------------+        +-------------------+

         +-------------------+        +-------------------+
         | Fraud Detection   |        | Notification Svc  |
         | (rules + ML)      |        | (email receipts)  |
         +-------------------+        +-------------------+

9. Handling Failures

Payment systems must handle every failure mode gracefully. Money is at stake.

Retry Strategy

retry_payment(request, max_retries=3):
  for attempt in 1..max_retries:
    try:
      response = call_psp(request)
      if response.status == "success":
        return response
      if response.status == "declined":
        return response          # don't retry declines
      if response.status == "error":
        wait(exponential_backoff(attempt))
        continue                 # retry on transient errors
    catch NetworkTimeout:
      # CRITICAL: Don't know if PSP processed it
      # Must check status before retrying
      status = poll_psp_status(request.idempotency_key)
      if status == "completed":
        return status
      wait(exponential_backoff(attempt))
  
  return FAILED  # all retries exhausted, manual investigation needed

Critical Failure Scenarios

ScenarioDangerSolution
Timeout after PSP callDon't know if chargedPoll PSP status, use idempotency key
DB write fails after PSP successMoney charged but no recordRetry DB write; dead-letter queue for manual fix
PSP downCan't process paymentsCircuit breaker, fallback to secondary PSP
Webhook never arrivesStatus stuck as "pending"Scheduled polling job checks pending payments
Double webhook deliveryProcess refund twiceIdempotent webhook processing

10. Webhook Handling

PSPs send asynchronous status updates via webhooks.

PSP (Stripe) -> POST /webhooks/stripe
                {
                  "type": "payment_intent.succeeded",
                  "data": { "id": "pi_123", "amount": 5000, ... }
                }

Webhook Processing:

1. Verify signature (HMAC with shared secret)
2. Check idempotency (have we already processed this event ID?)
3. Update internal payment status
4. Update ledger entries
5. Trigger downstream actions (send receipt, fulfill order)
6. Return 200 OK to PSP (must respond quickly, < 5s)

If processing takes longer:
  - ACK immediately (200 OK)
  - Queue event for async processing
  - PSP won't retry if it receives 200

11. Fraud Detection

LayerTechniqueExample
Rule-basedHard rules that block suspicious transactionsAmount > $10K, velocity > 5 txn/min, mismatched country
Risk scoringML model assigns risk score 0-100Score > 80 -> block, 50-80 -> 3DS challenge, < 50 -> allow
3D SecureRedirect to card issuer for additional verificationShifts liability to issuer
Address VerificationMatch billing address with card issuer recordsAVS check
Device fingerprintingTrack device characteristicsKnown fraudster device

12. PCI DSS Compliance

Payment Card Industry Data Security Standard. Required for any system handling card data.

Strategy: Minimize PCI scope by NEVER touching raw card numbers.

                Client Browser
                     |
                     v
              +------+-------+
              | Stripe.js /  |  Card number entered in Stripe's iframe
              | PSP SDK      |  (never touches your servers)
              +------+-------+
                     |
                     v
              +------+-------+
              | PSP (Stripe) |  Returns a token (tok_abc123)
              +------+-------+
                     |
                     v
              +------+-------+
              | Your Payment |  Only sees the token, never the card number
              | Service      |  Charges using token via PSP API
              +--------------+

PCI scope: SAQ-A (minimal) instead of SAQ-D (full audit)
Compliance LevelCard Data HandlingEffort
SAQ-APSP handles all card data (tokens only)Low (self-assessment)
SAQ-A-EPCard data goes through your infrastructureMedium
SAQ-DYou store/process card data directlyHigh (full audit, very expensive)

13. Currency Conversion

Multi-Currency Flow:

Customer pays in EUR -> PSP converts -> Merchant receives USD

Exchange Rate Handling:
  1. Display price in customer's currency (using cached rate)
  2. At payment time: lock rate for the transaction
  3. PSP performs actual conversion at settlement
  4. Record both currencies in ledger:
     Debit:  customer_eur   EUR 85.00
     Credit: merchant_usd   USD 100.00
     Metadata: rate = 1.1765, rate_source = "ECB", rate_locked_at = "2024-01-15T10:00:00Z"
ApproachWhen Rate LockedRisk
At display timeUser sees price -> that's what they payMerchant absorbs rate fluctuation
At authorizationRate locked when card authorizedSmall discrepancy from displayed price
At settlementPSP's rate at settlement (days later)Unpredictable for merchant

14. Database Schema (Simplified)

sql
-- Payments CREATE TABLE payments ( payment_id UUID PRIMARY KEY, order_id UUID NOT NULL, idempotency_key TEXT UNIQUE NOT NULL, amount DECIMAL(12,2) NOT NULL, currency CHAR(3) NOT NULL, status TEXT NOT NULL DEFAULT 'CREATED', psp_name TEXT NOT NULL, -- 'stripe', 'paypal' psp_payment_id TEXT, -- external reference created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); -- Ledger entries (append-only, never update/delete) CREATE TABLE ledger_entries ( entry_id BIGSERIAL PRIMARY KEY, transaction_id UUID NOT NULL, account_id TEXT NOT NULL, entry_type TEXT NOT NULL, -- 'DEBIT' or 'CREDIT' amount DECIMAL(12,2) NOT NULL, currency CHAR(3) NOT NULL, description TEXT, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Ensure double-entry balance -- For every transaction_id: SUM(DEBIT) must equal SUM(CREDIT) -- Webhook events (for idempotent processing) CREATE TABLE webhook_events ( event_id TEXT PRIMARY KEY, -- PSP's event ID psp_name TEXT NOT NULL, event_type TEXT NOT NULL, payload JSONB NOT NULL, processed BOOLEAN DEFAULT FALSE, processed_at TIMESTAMPTZ, received_at TIMESTAMPTZ DEFAULT NOW() );

15. Key Trade-offs Discussion

DecisionOption AOption B
PSP strategySingle PSP (simpler)Multi-PSP with routing (resilience, cost optimization)
Capture timingImmediate capture (simpler)Delayed capture (better for fulfillment-based businesses)
Ledger storageSingle DB (simpler, ACID)Event-sourced (immutable log, replay-friendly)
Currency conversionLock at display (better UX)Lock at settlement (PSP default)
Fraud detectionRules only (predictable)ML scoring (more accurate, more complex)
Webhook processingSynchronous (simpler)Async queue (more reliable, handles spikes)

16. Interview Checklist

  • Explained the full payment flow: order -> intent -> authorize -> capture
  • Stressed idempotency as the most critical safety mechanism
  • Designed double-entry ledger with append-only entries
  • Covered reconciliation process (internal vs PSP records)
  • Failure handling for every stage (timeout, PSP down, DB failure)
  • Webhook handling with signature verification and idempotent processing
  • PCI DSS compliance via tokenization (never touch raw card data)
  • Payment state machine with all transitions
  • Multi-currency with rate locking strategy
  • Mentioned fraud detection layers

17. Resources

  • System Design Interview (Alex Xu, Vol 2) -- Chapter: Payment System
  • Stripe Documentation -- stripe.com/docs (gold standard for payment API design)
  • Paper: "Designing Reliable Payment Systems" (ACM Queue)
  • YouTube: System Design Interview -- Design Payment System
  • Book: "Payment Systems in the U.S." by Carol Coye Benson
  • PCI DSS Official Documentation -- pcisecuritystandards.org

Previous: 41 - Design Ticket Booking System | Next: 43 - CRDT & Conflict-Free Replication