26 - Security in Distributed Systems
Previous: 25 - Monitoring, Logging & Tracing | Next: 27 - Design URL Shortener
Why This Matters in Interviews
Security is a first-class concern in every FAANG system design. Interviewers probe: "How does Service A trust Service B?", "Where do you store secrets?", "How do you prevent token theft?" Demonstrating security thinking elevates your answer from good to great.
Authentication vs Authorization
Authentication (AuthN): WHO are you? --> Identity verification
Authorization (AuthZ): WHAT can you do? --> Permission checking
Flow:
User --> [Authenticate: verify identity] --> [Authorize: check permissions] --> Resource
| Aspect | Authentication | Authorization |
|---|---|---|
| Question | "Who is this?" | "Can they do this?" |
| Mechanism | Passwords, tokens, certificates | Roles, policies, ACLs |
| Failure | 401 Unauthorized | 403 Forbidden |
| Happens | First | After authentication |
OAuth 2.0
OAuth 2.0 is a delegation framework -- it lets a third-party app access resources on behalf of a user without sharing credentials.
Authorization Code Flow (Most Common for Web Apps)
+--------+ +---------------+
| |---(1) Authorization Request-->| |
| | | Authorization |
| Client |<--(2) Authorization Code-----| Server |
| (App) | | |
| |---(3) Code + Client Secret--->| |
| |<--(4) Access Token + Refresh--| |
+--------+ +---------------+
| |
|---(5) API Request with Access Token----->|
| +------+------+
| | Resource |
|<--(6) Protected Resource---------| Server |
| +-------------+
Client Credentials Flow (Service-to-Service)
+----------+ +---------------+
| |---(1) Client ID +------->| |
| Service | Client Secret | Authorization |
| A |<--(2) Access Token ------| Server |
| | +---------------+
| |---(3) Call Service B with token---------->|
+----------+ +-----+-----+
| Service B |
+-----------+
No user involved. Used for backend-to-backend communication.
OAuth 2.0 Flow Comparison
| Flow | Use Case | Involves User? | Secrets on Client? |
|---|---|---|---|
| Authorization Code | Web apps, mobile (with PKCE) | Yes | Server-side only |
| Client Credentials | Service-to-service | No | Yes (server-to-server) |
| Implicit (deprecated) | Legacy SPAs | Yes | No (tokens in URL) |
| Device Code | Smart TVs, CLI tools | Yes (on separate device) | No |
OpenID Connect (OIDC)
OIDC is an identity layer on top of OAuth 2.0. OAuth gives you an access token (authorization); OIDC also gives you an ID token (authentication).
OAuth 2.0: "Here's a token to access the user's photos" (AuthZ)
OIDC: "Here's proof that this is user@example.com" (AuthN + AuthZ)
ID Token = JWT containing claims about the user (sub, email, name, etc.)
JWT (JSON Web Tokens)
Structure
Header.Payload.Signature
eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxMjM0NSIsImVtYWlsIjoiam9obkBleC5jb20ifQ.signature
|_____________________||_____________________________________________||_________|
Header Payload Signature
| Part | Contains | Encoded |
|---|---|---|
| Header | Algorithm (RS256), token type (JWT) | Base64url |
| Payload | Claims: sub, exp, iat, iss, custom data | Base64url |
| Signature | HMAC or RSA signature over header + payload | Binary |
Pros and Cons
| Pros | Cons |
|---|---|
| Stateless (no server-side session lookup) | Cannot be revoked individually (until expiry) |
| Contains claims (roles, user info) | Payload is readable (Base64, not encrypted) |
| Works across services (shared secret or public key) | Larger than opaque tokens |
| Standard format (interoperable) | Clock skew issues with expiry |
Refresh Token Pattern
+--------+ +----------+
| Client |---(1) Login----------->| Auth |
| |<--(2) Access (15min) + | Server |
| | Refresh (7d)-----| |
| | +----------+
| |---(3) API call with Access Token-->| API |
| | | |
| | (Access token expires) | |
| | +-----+
| |---(4) Refresh Token--------------->| Auth |
| |<--(5) New Access + New Refresh-----| |
+--------+ +-------+
Security rules for refresh tokens:
- Store securely (httpOnly cookie, not localStorage)
- Rotate on every use (detect theft if old one is reused)
- Bind to device/IP when possible
- Shorter lifetimes for sensitive applications
API Key Management
| Concern | Best Practice |
|---|---|
| Storage | Hash API keys before storing (like passwords) |
| Transmission | Always over HTTPS, in headers not URLs |
| Rotation | Support multiple active keys for zero-downtime rotation |
| Scoping | Limit keys to specific endpoints/permissions |
| Rate limiting | Tie rate limits to API keys |
| Revocation | Immediate revocation capability |
| Logging | Log key usage but never log the key itself |
mTLS (Mutual TLS)
Standard TLS: client verifies server. mTLS: both sides verify each other.
Standard TLS:
Client ---[verify server cert]---> Server
Client <------[encrypted]-------> Server
Mutual TLS:
Client ---[verify server cert]---> Server
Client <--[verify client cert]---- Server
Client <------[encrypted]-------> Server
Use case: Service-to-service communication in microservices (service mesh).
Service A (client cert) <--mTLS--> Service B (server cert)
Managed by service mesh (Istio, Linkerd):
- Automatic certificate issuance and rotation
- Zero application code changes
- Every connection authenticated and encrypted
Benefits:
- Strong identity for every service (not just "someone with a valid token")
- Encryption in transit by default
- No shared secrets to manage
Zero Trust Architecture
Traditional (castle-and-moat):
[Firewall] --> Trusted internal network --> Services trust each other
Zero Trust:
Every request authenticated + authorized, regardless of network location
"Never trust, always verify"
Principles
| Principle | Implementation |
|---|---|
| Verify explicitly | AuthN + AuthZ on every request |
| Least privilege | Minimal permissions, scoped tokens |
| Assume breach | Encrypt everything, segment networks, monitor all |
Zero Trust in Practice
Request --> [Identity Proxy/Gateway]
|
+--> Authenticate (JWT, mTLS, device cert)
+--> Authorize (policy engine: OPA, Zanzibar)
+--> Check device health
+--> Check network context
+--> Allow/Deny
|
v
Service
RBAC vs ABAC
RBAC (Role-Based Access Control)
User --> has Role --> Role has Permissions
Example:
alice --> "editor" --> [create_post, edit_post, delete_own_post]
bob --> "admin" --> [create_post, edit_post, delete_any_post, manage_users]
ABAC (Attribute-Based Access Control)
Policy: ALLOW if user.department == resource.department
AND user.clearance >= resource.classification
AND time.current BETWEEN 09:00 AND 18:00
Comparison
| Aspect | RBAC | ABAC |
|---|---|---|
| Complexity | Simple | Complex |
| Granularity | Coarse (role-level) | Fine (attribute-level) |
| Scalability | Role explosion with complex rules | Scales with policy engine |
| Auditability | Easy (who has what role?) | Harder (policy evaluation) |
| Best for | Most applications | Multi-tenant, regulatory, context-dependent |
Google Zanzibar: A relationship-based access control system (ReBAC) used at Google scale. Inspired open-source projects like SpiceDB, Ory Keto.
Rate Limiting for Security
Beyond traffic management, rate limiting is a security control:
| Attack | Rate Limiting Defense |
|---|---|
| Brute force login | Max 5 attempts per account per 15 min |
| Credential stuffing | Max 100 login attempts per IP per hour |
| API abuse | Per-key request limits |
| Scraping | Per-IP + per-session limits |
| DDoS | Global rate limits + geographic filtering |
Encryption
Three Layers
At Rest: Data on disk encrypted (AES-256)
[Disk] --> [Encrypted blocks]
In Transit: Data between services encrypted (TLS 1.3)
Service A --[TLS]--> Service B
End-to-End: Data encrypted by sender, decrypted only by recipient
User A --[E2E encrypted]--> User B
(Server cannot read the data)
| Type | Protects Against | Implementation |
|---|---|---|
| At rest | Disk theft, unauthorized DB access | AES-256, KMS-managed keys |
| In transit | Network sniffing, MITM | TLS 1.3, mTLS |
| End-to-end | Server compromise, insider threat | Client-side encryption (Signal protocol) |
Key Management
+--------+ wraps +----------+ wraps +--------+
| Root | ------------> | Master | ------------> | Data |
| Key | (HSM/KMS) | Key | (envelope) | Key |
| (never | | (per | | (per |
| leaves| | service)| | record)|
| HSM) | +----------+ +--------+
+--------+
Envelope encryption: Encrypt data with a data key, encrypt the data key with a master key. Only the encrypted data key is stored alongside the data.
Secrets Management
HashiCorp Vault
+----------+ +-------+
| Service |---(1) Authenticate------>| Vault |
| | (AppRole, K8s SA, | |
| | AWS IAM) | |
| |<--(2) Lease + Secret-----| |
| | (DB creds, API key, | |
| | TLS cert) | |
+----------+ +-------+
|
[Auto-rotate]
[Audit log]
[Lease expiry]
Never do:
- Secrets in source code or config files committed to git
- Secrets in environment variables without a secrets manager
- Shared secrets across environments (dev key = prod key)
Common Attack Vectors in Distributed Systems
| Attack | Description | Mitigation |
|---|---|---|
| Man-in-the-Middle (MITM) | Attacker intercepts communication between services | mTLS, certificate pinning |
| Replay attack | Attacker captures and re-sends a valid request | Nonces, timestamps, idempotency keys |
| Token theft | Stolen JWT/session used to impersonate user | Short expiry, token binding, refresh rotation |
| Confused deputy | Service tricked into misusing its authority | Scoped tokens, request validation |
| Privilege escalation | User gains higher permissions than intended | Least privilege, input validation on role claims |
| Supply chain attack | Compromised dependency/container image | Image signing, SBOMs, dependency scanning |
| Insider threat | Malicious employee with access | Audit logging, least privilege, secrets rotation |
| DNS hijacking | Traffic redirected to malicious server | DNSSEC, certificate transparency |
Interview Tips
- JWT trade-offs are a classic question. Know the revocation problem and when to use opaque tokens instead.
- mTLS for service-to-service is expected at FAANG. Mention service mesh (Istio) for automated cert management.
- Zero trust > perimeter security. Show you think beyond firewalls.
- "Where do you store secrets?" Never in code. Vault, KMS, or managed secrets service.
- OAuth 2.0 flows come up in API design. Know which flow for which client type.
- Encryption layers -- mention all three (rest, transit, E2E) and when each applies.
Common Interview Questions
- "How do microservices authenticate with each other?"
- "Design an authentication system for a multi-tenant SaaS"
- "What happens when a JWT is stolen? How do you mitigate?"
- "Explain the difference between OAuth 2.0 and OpenID Connect"
- "How would you implement fine-grained authorization at scale?"
Resources
- DDIA Chapter 4: Encoding (touches on security of data formats)
- OAuth 2.0 Simplified - Aaron Parecki
- Zanzibar Paper: Google's global authorization system
- Auth0 blog: "Which OAuth 2.0 Flow Should I Use?"
- HashiCorp Vault documentation
- NIST Zero Trust Architecture (SP 800-207)
- OWASP Top 10 for APIs
Previous: 25 - Monitoring, Logging & Tracing | Next: 27 - Design URL Shortener