26 - Security in Distributed Systems

Previous: 25 - Monitoring, Logging & Tracing | Next: 27 - Design URL Shortener


Why This Matters in Interviews

Security is a first-class concern in every FAANG system design. Interviewers probe: "How does Service A trust Service B?", "Where do you store secrets?", "How do you prevent token theft?" Demonstrating security thinking elevates your answer from good to great.


Authentication vs Authorization

Authentication (AuthN):  WHO are you?     --> Identity verification
Authorization  (AuthZ):  WHAT can you do? --> Permission checking

Flow:
  User --> [Authenticate: verify identity] --> [Authorize: check permissions] --> Resource
AspectAuthenticationAuthorization
Question"Who is this?""Can they do this?"
MechanismPasswords, tokens, certificatesRoles, policies, ACLs
Failure401 Unauthorized403 Forbidden
HappensFirstAfter authentication

OAuth 2.0

OAuth 2.0 is a delegation framework -- it lets a third-party app access resources on behalf of a user without sharing credentials.

Authorization Code Flow (Most Common for Web Apps)

+--------+                               +---------------+
|        |---(1) Authorization Request-->|               |
|        |                               | Authorization |
| Client |<--(2) Authorization Code-----|    Server     |
| (App)  |                               |               |
|        |---(3) Code + Client Secret--->|               |
|        |<--(4) Access Token + Refresh--|               |
+--------+                               +---------------+
     |                                          |
     |---(5) API Request with Access Token----->|
     |                                   +------+------+
     |                                   |   Resource   |
     |<--(6) Protected Resource---------|    Server    |
     |                                   +-------------+

Client Credentials Flow (Service-to-Service)

+----------+                          +---------------+
|          |---(1) Client ID +------->|               |
|  Service |     Client Secret        | Authorization |
|    A     |<--(2) Access Token ------|    Server     |
|          |                          +---------------+
|          |---(3) Call Service B with token---------->|
+----------+                                    +-----+-----+
                                                | Service B  |
                                                +-----------+

No user involved. Used for backend-to-backend communication.

OAuth 2.0 Flow Comparison

FlowUse CaseInvolves User?Secrets on Client?
Authorization CodeWeb apps, mobile (with PKCE)YesServer-side only
Client CredentialsService-to-serviceNoYes (server-to-server)
Implicit (deprecated)Legacy SPAsYesNo (tokens in URL)
Device CodeSmart TVs, CLI toolsYes (on separate device)No

OpenID Connect (OIDC)

OIDC is an identity layer on top of OAuth 2.0. OAuth gives you an access token (authorization); OIDC also gives you an ID token (authentication).

OAuth 2.0:   "Here's a token to access the user's photos"  (AuthZ)
OIDC:        "Here's proof that this is user@example.com"   (AuthN + AuthZ)

ID Token = JWT containing claims about the user (sub, email, name, etc.)


JWT (JSON Web Tokens)

Structure

Header.Payload.Signature

eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxMjM0NSIsImVtYWlsIjoiam9obkBleC5jb20ifQ.signature
|_____________________||_____________________________________________||_________|
       Header                          Payload                        Signature
PartContainsEncoded
HeaderAlgorithm (RS256), token type (JWT)Base64url
PayloadClaims: sub, exp, iat, iss, custom dataBase64url
SignatureHMAC or RSA signature over header + payloadBinary

Pros and Cons

ProsCons
Stateless (no server-side session lookup)Cannot be revoked individually (until expiry)
Contains claims (roles, user info)Payload is readable (Base64, not encrypted)
Works across services (shared secret or public key)Larger than opaque tokens
Standard format (interoperable)Clock skew issues with expiry

Refresh Token Pattern

+--------+                         +----------+
| Client |---(1) Login----------->| Auth      |
|        |<--(2) Access (15min) + | Server    |
|        |       Refresh (7d)-----|           |
|        |                        +----------+
|        |---(3) API call with Access Token-->| API |
|        |                                     |     |
|        |  (Access token expires)             |     |
|        |                                     +-----+
|        |---(4) Refresh Token--------------->| Auth  |
|        |<--(5) New Access + New Refresh-----|       |
+--------+                                    +-------+

Security rules for refresh tokens:

  • Store securely (httpOnly cookie, not localStorage)
  • Rotate on every use (detect theft if old one is reused)
  • Bind to device/IP when possible
  • Shorter lifetimes for sensitive applications

API Key Management

ConcernBest Practice
StorageHash API keys before storing (like passwords)
TransmissionAlways over HTTPS, in headers not URLs
RotationSupport multiple active keys for zero-downtime rotation
ScopingLimit keys to specific endpoints/permissions
Rate limitingTie rate limits to API keys
RevocationImmediate revocation capability
LoggingLog key usage but never log the key itself

mTLS (Mutual TLS)

Standard TLS: client verifies server. mTLS: both sides verify each other.

Standard TLS:
  Client ---[verify server cert]---> Server
  Client <------[encrypted]-------> Server

Mutual TLS:
  Client ---[verify server cert]---> Server
  Client <--[verify client cert]---- Server
  Client <------[encrypted]-------> Server

Use case: Service-to-service communication in microservices (service mesh).

Service A (client cert) <--mTLS--> Service B (server cert)

Managed by service mesh (Istio, Linkerd):
  - Automatic certificate issuance and rotation
  - Zero application code changes
  - Every connection authenticated and encrypted

Benefits:

  • Strong identity for every service (not just "someone with a valid token")
  • Encryption in transit by default
  • No shared secrets to manage

Zero Trust Architecture

Traditional (castle-and-moat):
  [Firewall] --> Trusted internal network --> Services trust each other

Zero Trust:
  Every request authenticated + authorized, regardless of network location
  "Never trust, always verify"

Principles

PrincipleImplementation
Verify explicitlyAuthN + AuthZ on every request
Least privilegeMinimal permissions, scoped tokens
Assume breachEncrypt everything, segment networks, monitor all

Zero Trust in Practice

Request --> [Identity Proxy/Gateway]
              |
              +--> Authenticate (JWT, mTLS, device cert)
              +--> Authorize (policy engine: OPA, Zanzibar)
              +--> Check device health
              +--> Check network context
              +--> Allow/Deny
              |
              v
           Service

RBAC vs ABAC

RBAC (Role-Based Access Control)

User --> has Role --> Role has Permissions

Example:
  alice --> "editor" --> [create_post, edit_post, delete_own_post]
  bob   --> "admin"  --> [create_post, edit_post, delete_any_post, manage_users]

ABAC (Attribute-Based Access Control)

Policy: ALLOW if user.department == resource.department
                AND user.clearance >= resource.classification
                AND time.current BETWEEN 09:00 AND 18:00

Comparison

AspectRBACABAC
ComplexitySimpleComplex
GranularityCoarse (role-level)Fine (attribute-level)
ScalabilityRole explosion with complex rulesScales with policy engine
AuditabilityEasy (who has what role?)Harder (policy evaluation)
Best forMost applicationsMulti-tenant, regulatory, context-dependent

Google Zanzibar: A relationship-based access control system (ReBAC) used at Google scale. Inspired open-source projects like SpiceDB, Ory Keto.


Rate Limiting for Security

Beyond traffic management, rate limiting is a security control:

AttackRate Limiting Defense
Brute force loginMax 5 attempts per account per 15 min
Credential stuffingMax 100 login attempts per IP per hour
API abusePer-key request limits
ScrapingPer-IP + per-session limits
DDoSGlobal rate limits + geographic filtering

Encryption

Three Layers

At Rest:        Data on disk encrypted (AES-256)
                [Disk] --> [Encrypted blocks]

In Transit:     Data between services encrypted (TLS 1.3)
                Service A --[TLS]--> Service B

End-to-End:     Data encrypted by sender, decrypted only by recipient
                User A --[E2E encrypted]--> User B
                (Server cannot read the data)
TypeProtects AgainstImplementation
At restDisk theft, unauthorized DB accessAES-256, KMS-managed keys
In transitNetwork sniffing, MITMTLS 1.3, mTLS
End-to-endServer compromise, insider threatClient-side encryption (Signal protocol)

Key Management

+--------+     wraps      +----------+     wraps      +--------+
|  Root  | ------------> |  Master  | ------------> |  Data  |
|  Key   |   (HSM/KMS)   |   Key    |   (envelope)   |  Key   |
| (never |               | (per     |               | (per   |
|  leaves|               |  service)|               |  record)|
|  HSM)  |               +----------+               +--------+
+--------+

Envelope encryption: Encrypt data with a data key, encrypt the data key with a master key. Only the encrypted data key is stored alongside the data.


Secrets Management

HashiCorp Vault

+----------+                          +-------+
| Service  |---(1) Authenticate------>| Vault |
|          |   (AppRole, K8s SA,      |       |
|          |    AWS IAM)              |       |
|          |<--(2) Lease + Secret-----|       |
|          |   (DB creds, API key,    |       |
|          |    TLS cert)             |       |
+----------+                          +-------+
                                         |
                                    [Auto-rotate]
                                    [Audit log]
                                    [Lease expiry]

Never do:

  • Secrets in source code or config files committed to git
  • Secrets in environment variables without a secrets manager
  • Shared secrets across environments (dev key = prod key)

Common Attack Vectors in Distributed Systems

AttackDescriptionMitigation
Man-in-the-Middle (MITM)Attacker intercepts communication between servicesmTLS, certificate pinning
Replay attackAttacker captures and re-sends a valid requestNonces, timestamps, idempotency keys
Token theftStolen JWT/session used to impersonate userShort expiry, token binding, refresh rotation
Confused deputyService tricked into misusing its authorityScoped tokens, request validation
Privilege escalationUser gains higher permissions than intendedLeast privilege, input validation on role claims
Supply chain attackCompromised dependency/container imageImage signing, SBOMs, dependency scanning
Insider threatMalicious employee with accessAudit logging, least privilege, secrets rotation
DNS hijackingTraffic redirected to malicious serverDNSSEC, certificate transparency

Interview Tips

  1. JWT trade-offs are a classic question. Know the revocation problem and when to use opaque tokens instead.
  2. mTLS for service-to-service is expected at FAANG. Mention service mesh (Istio) for automated cert management.
  3. Zero trust > perimeter security. Show you think beyond firewalls.
  4. "Where do you store secrets?" Never in code. Vault, KMS, or managed secrets service.
  5. OAuth 2.0 flows come up in API design. Know which flow for which client type.
  6. Encryption layers -- mention all three (rest, transit, E2E) and when each applies.

Common Interview Questions

  • "How do microservices authenticate with each other?"
  • "Design an authentication system for a multi-tenant SaaS"
  • "What happens when a JWT is stolen? How do you mitigate?"
  • "Explain the difference between OAuth 2.0 and OpenID Connect"
  • "How would you implement fine-grained authorization at scale?"

Resources

  • DDIA Chapter 4: Encoding (touches on security of data formats)
  • OAuth 2.0 Simplified - Aaron Parecki
  • Zanzibar Paper: Google's global authorization system
  • Auth0 blog: "Which OAuth 2.0 Flow Should I Use?"
  • HashiCorp Vault documentation
  • NIST Zero Trust Architecture (SP 800-207)
  • OWASP Top 10 for APIs

Previous: 25 - Monitoring, Logging & Tracing | Next: 27 - Design URL Shortener