27 - Design URL Shortener
Previous: 26 - Security in Distributed Systems | Next: 28 - Design Rate Limiter
Why This Matters in Interviews
The URL shortener is one of the most frequently asked system design problems. It appears simple but unfolds into discussions about hashing, encoding, database design, caching, analytics, and scale. Interviewers gauge your ability to start simple and layer in complexity.
Step 1: Requirements
Functional Requirements
- Given a long URL, generate a short URL
- Given a short URL, redirect to the original long URL
- (Optional) Custom short URLs
- (Optional) URL expiration (TTL)
- (Optional) Analytics (click count, referrer, geography)
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Availability | 99.99% (reads are critical) |
| Latency | Redirect in < 50ms (p99) |
| Durability | URLs must not be lost |
| Scale | 100M new URLs/month, 10B redirects/month |
| Short URL length | As short as possible |
Step 2: Estimation
Traffic
Writes: 100M URLs/month
= ~40 URLs/sec (average)
= ~400 URLs/sec (peak, 10x)
Reads: 10B redirects/month (100:1 read:write ratio)
= ~4,000 QPS (average)
= ~40,000 QPS (peak)
Storage
Per URL record:
short_url: 7 bytes
long_url: ~200 bytes (average)
created_at: 8 bytes
expires_at: 8 bytes
user_id: 8 bytes
metadata: ~50 bytes
Total: ~280 bytes
5 years of data:
100M/month * 12 * 5 = 6B URLs
6B * 280 bytes = ~1.7 TB
URL Space
With Base62 encoding (a-z, A-Z, 0-9):
6 characters: 62^6 = 56.8 billion (sufficient for years)
7 characters: 62^7 = 3.5 trillion (very safe)
Step 3: API Design
POST /api/v1/urls
Body: { "long_url": "https://example.com/very/long/path",
"custom_alias": "my-link", // optional
"expires_at": "2025-12-31" } // optional
Response: { "short_url": "https://tny.io/aB3xY7",
"expires_at": "2025-12-31" }
GET /{short_url_key}
Response: 301/302 redirect to long URL
GET /api/v1/urls/{short_url_key}/stats
Response: { "clicks": 15234,
"created_at": "2024-01-15",
"top_referrers": [...],
"clicks_by_country": {...} }
Step 4: URL Encoding Strategies
Option A: Hash + Base62
long_url --> MD5/SHA256 --> take first 43 bits --> Base62 encode --> 7-char key
Example:
"https://example.com/path" --> MD5 --> "d41d8cd..." --> first 43 bits
--> decimal: 8234567890 --> Base62 --> "aB3xY7k"
Collision handling:
- Check database if key exists
- If collision, append a counter or re-hash with salt
- Bloom filter for fast existence check
Option B: Counter-Based (Pre-Generated IDs)
Auto-increment counter --> Base62 encode --> short URL key
Counter: 1000000001 --> Base62 --> "aB3xY7"
Counter: 1000000002 --> Base62 --> "aB3xY8"
Distributed counter options:
- Single DB auto-increment: Simple but single point of failure
- Range-based: Each server gets a range (Server 1: 1M-2M, Server 2: 2M-3M)
- Snowflake-like ID: Timestamp + machine ID + sequence
- ZooKeeper coordination: Servers lease ranges from ZooKeeper
Comparison
| Approach | Pros | Cons |
|---|---|---|
| Hash-based | No coordination needed, same URL = same hash | Collisions, longer computation |
| Counter-based | No collisions, predictable | Needs coordination, guessable |
| Snowflake ID | No coordination, no collision | Longer IDs, not purely sequential |
| Pre-generated keys | Fast, no collision, no coordination | Key DB management overhead |
Pre-Generated Key Service (Recommended for interviews)
+------------------+
| Key Generation |
| Service (KGS) |
| |
| [Unused Keys DB] |----> Generate millions of keys offline
| [Used Keys DB] | Move to "used" after assignment
+------------------+
|
| (batch of keys)
v
+------------------+
| App Server 1 | --> Keeps ~1000 keys in memory
| App Server 2 | --> Keeps ~1000 keys in memory
| App Server N | --> Keeps ~1000 keys in memory
+------------------+
- Keys pre-generated and stored in a database
- Each app server requests a batch (1000 keys at a time)
- No coordination between app servers
- If a server dies, those unused keys are lost (acceptable waste)
Step 5: System Architecture
+-------------------+
| Load Balancer |
+--------+----------+
|
+------------------+------------------+
| | |
+-------+------+ +-------+------+ +-------+------+
| App Server 1 | | App Server 2 | | App Server N |
| (key cache) | | (key cache) | | (key cache) |
+-------+------+ +-------+------+ +-------+------+
| | |
+--------+---------+--------+--------+
| |
+-------+------+ +-------+-------+
| Cache | | Database |
| (Redis) | | (Primary + |
| [key->url] | | Replicas) |
+--------------+ +----------------+
|
+-------+-------+
| Key Gen |
| Service |
+---------------+
|
+-------+-------+
| Analytics |
| (Kafka --> |
| ClickHouse) |
+---------------+
Step 6: Database Choice
| Option | Pros | Cons |
|---|---|---|
| NoSQL (DynamoDB, Cassandra) | Simple key-value lookup, horizontal scaling, high write throughput | Less flexible querying |
| SQL (PostgreSQL, MySQL) | ACID, flexible queries for analytics | Sharding complexity at scale |
Recommended: NoSQL (DynamoDB or Cassandra) for the core URL mapping. Simple key-value access pattern fits perfectly.
Schema
Table: url_mappings
+-----------+-----------------------------------------+---------------------+-----+
| short_key | long_url | created_at | ttl |
+-----------+-----------------------------------------+---------------------+-----+
| aB3xY7 | https://example.com/very/long/path | 2024-01-15 10:30:00 | ... |
+-----------+-----------------------------------------+---------------------+-----+
Partition key: short_key (ensures even distribution)
Step 7: Redirect -- 301 vs 302
| Code | Type | Behavior | Use When |
|---|---|---|---|
| 301 | Permanent Redirect | Browser caches, won't hit server again | Don't need analytics (less server load) |
| 302 | Temporary Redirect | Browser always hits server | Need analytics tracking (recommended) |
Interview recommendation: Use 302 because:
- Every click hits the server, enabling analytics
- Allows changing the destination URL later
- URL expiration works correctly
Step 8: Cache Strategy
Read path (redirect):
Client --> App Server --> [Check Redis Cache]
|
Hit? YES --> Return long_url (< 1ms)
|
NO --> Query Database --> Store in cache --> Return
- Cache eviction: LRU (Least Recently Used)
- Cache size: 20% of daily traffic covers most hot URLs (Pareto principle)
- Cache warm-up: Pre-load top URLs on deployment
Cache sizing:
Daily redirects: ~330M
Top 20% = ~66M URLs
66M * 280 bytes = ~18 GB (fits in a Redis cluster)
Step 9: Analytics Tracking
Redirect flow with analytics:
Client --> App Server --> [302 redirect]
|
+--> Async: Publish click event to Kafka
{
"short_key": "aB3xY7",
"timestamp": "...",
"referrer": "twitter.com",
"user_agent": "...",
"ip": "...",
"country": "US"
}
|
Kafka --> ClickHouse/Druid (OLAP)
|
Dashboard / API for stats
Key principle: Never let analytics slow down the redirect. Fire-and-forget to Kafka.
Step 10: Handling Edge Cases
Collisions (Hash-Based)
Generate hash --> Check DB --> Exists?
NO --> Store and return
YES --> Append salt/counter, rehash --> Check again
(max 3 retries, then fall back to counter-based)
Custom Short URLs
- Validate: length, allowed characters, not reserved words
- Check uniqueness against existing keys
- Store with a flag indicating "custom" (don't recycle)
Expiration
- Store
expires_atfield - On read: check if expired, return 404 if so
- Background job: periodically clean up expired entries
- DynamoDB TTL handles this natively
Rate Limiting
- Limit URL creation per user/IP (prevent abuse)
- Limit redirects per short URL per second (prevent DDoS amplification)
- See 28 - Design Rate Limiter
Deep Dive Topics
Deduplication
Should the same long URL always return the same short URL?
| Approach | Pros | Cons |
|---|---|---|
| Deduplicate | Saves storage, consistent | Need secondary index on long_url, slower writes |
| Allow duplicates | Simpler, faster writes | More storage, different users get different short URLs |
Recommendation: Allow duplicates unless storage is a critical concern. Simplicity wins.
Global Distribution
User (Europe) --> DNS (GeoDNS) --> CDN Edge (Europe)
|
[Edge cache hit?]
YES --> redirect
NO --> Origin (us-east)
- GeoDNS routes to nearest edge
- CDN caches 301 redirects naturally
- For 302, use regional caches + replicated database
Interview Walkthrough Checklist
[ ] Clarify requirements (read vs write ratio, custom URLs, analytics)
[ ] Back-of-envelope estimation (QPS, storage, URL space)
[ ] API design (create, redirect, stats)
[ ] Encoding strategy (justify your choice)
[ ] High-level architecture diagram
[ ] Database choice (justify NoSQL for key-value)
[ ] Cache strategy (Redis, LRU, sizing)
[ ] 301 vs 302 trade-off
[ ] Analytics (async, Kafka)
[ ] Deep dive: collisions, expiration, rate limiting
Interview Tips
- Start with requirements and estimation. Don't jump to the solution.
- Justify encoding choice. Pre-generated keys are safest in an interview (no collision, no coordination).
- 302 over 301 if analytics matter. State the trade-off explicitly.
- Cache is critical. A URL shortener is read-heavy. Show the math on cache sizing.
- Analytics must be async. Never block the redirect for telemetry.
- Mention rate limiting unprompted. Shows security awareness.
Resources
- System Design Interview (Alex Xu): Chapter 8 - Design a URL Shortener
- Grokking the System Design Interview: URL Shortening Service
- Bitly Engineering Blog
- Martin Kleppmann's talks on unique ID generation
Previous: 26 - Security in Distributed Systems | Next: 28 - Design Rate Limiter