27 - Design URL Shortener

Previous: 26 - Security in Distributed Systems | Next: 28 - Design Rate Limiter

Why This Matters in Interviews

The URL shortener is one of the most frequently asked system design problems. It appears simple but unfolds into discussions about hashing, encoding, database design, caching, analytics, and scale. Interviewers gauge your ability to start simple and layer in complexity.

Step 1: Requirements

Functional Requirements

Given a long URL, generate a short URL
Given a short URL, redirect to the original long URL
(Optional) Custom short URLs
(Optional) URL expiration (TTL)
(Optional) Analytics (click count, referrer, geography)

Non-Functional Requirements

Requirement	Target
Availability	99.99% (reads are critical)
Latency	Redirect in < 50ms (p99)
Durability	URLs must not be lost
Scale	100M new URLs/month, 10B redirects/month
Short URL length	As short as possible

Step 2: Estimation

Traffic

Writes: 100M URLs/month
  = ~40 URLs/sec (average)
  = ~400 URLs/sec (peak, 10x)

Reads: 10B redirects/month (100:1 read:write ratio)
  = ~4,000 QPS (average)
  = ~40,000 QPS (peak)

Storage

Per URL record:
  short_url:    7 bytes
  long_url:     ~200 bytes (average)
  created_at:   8 bytes
  expires_at:   8 bytes
  user_id:      8 bytes
  metadata:     ~50 bytes
  Total:        ~280 bytes

5 years of data:
  100M/month * 12 * 5 = 6B URLs
  6B * 280 bytes = ~1.7 TB

URL Space

With Base62 encoding (a-z, A-Z, 0-9):

6 characters: 62^6 = 56.8 billion (sufficient for years)
7 characters: 62^7 = 3.5 trillion (very safe)

Step 3: API Design

POST /api/v1/urls
  Body: { "long_url": "https://example.com/very/long/path",
          "custom_alias": "my-link",    // optional
          "expires_at": "2025-12-31" }  // optional
  Response: { "short_url": "https://tny.io/aB3xY7",
              "expires_at": "2025-12-31" }

GET /{short_url_key}
  Response: 301/302 redirect to long URL

GET /api/v1/urls/{short_url_key}/stats
  Response: { "clicks": 15234,
              "created_at": "2024-01-15",
              "top_referrers": [...],
              "clicks_by_country": {...} }

Step 4: URL Encoding Strategies

Option A: Hash + Base62

long_url --> MD5/SHA256 --> take first 43 bits --> Base62 encode --> 7-char key

Example:
  "https://example.com/path" --> MD5 --> "d41d8cd..." --> first 43 bits
  --> decimal: 8234567890 --> Base62 --> "aB3xY7k"

Collision handling:

Check database if key exists
If collision, append a counter or re-hash with salt
Bloom filter for fast existence check

Option B: Counter-Based (Pre-Generated IDs)

Auto-increment counter --> Base62 encode --> short URL key

Counter: 1000000001 --> Base62 --> "aB3xY7"
Counter: 1000000002 --> Base62 --> "aB3xY8"

Distributed counter options:

Single DB auto-increment: Simple but single point of failure
Range-based: Each server gets a range (Server 1: 1M-2M, Server 2: 2M-3M)
Snowflake-like ID: Timestamp + machine ID + sequence
ZooKeeper coordination: Servers lease ranges from ZooKeeper

Comparison

Approach	Pros	Cons
Hash-based	No coordination needed, same URL = same hash	Collisions, longer computation
Counter-based	No collisions, predictable	Needs coordination, guessable
Snowflake ID	No coordination, no collision	Longer IDs, not purely sequential
Pre-generated keys	Fast, no collision, no coordination	Key DB management overhead

Pre-Generated Key Service (Recommended for interviews)

+------------------+
| Key Generation   |
| Service (KGS)    |
|                  |
| [Unused Keys DB] |----> Generate millions of keys offline
| [Used Keys DB]   |      Move to "used" after assignment
+------------------+
        |
        | (batch of keys)
        v
+------------------+
| App Server 1     | --> Keeps ~1000 keys in memory
| App Server 2     | --> Keeps ~1000 keys in memory
| App Server N     | --> Keeps ~1000 keys in memory
+------------------+

Keys pre-generated and stored in a database
Each app server requests a batch (1000 keys at a time)
No coordination between app servers
If a server dies, those unused keys are lost (acceptable waste)

Step 5: System Architecture

                        +-------------------+
                        |   Load Balancer   |
                        +--------+----------+
                                 |
              +------------------+------------------+
              |                  |                  |
      +-------+------+  +-------+------+  +-------+------+
      | App Server 1 |  | App Server 2 |  | App Server N |
      | (key cache)  |  | (key cache)  |  | (key cache)  |
      +-------+------+  +-------+------+  +-------+------+
              |                  |                  |
              +--------+---------+--------+--------+
                       |                  |
               +-------+------+   +-------+-------+
               |    Cache     |   |    Database    |
               |   (Redis)    |   |   (Primary +   |
               | [key->url]   |   |    Replicas)   |
               +--------------+   +----------------+
                                          |
                                  +-------+-------+
                                  |  Key Gen      |
                                  |  Service      |
                                  +---------------+
                                          |
                                  +-------+-------+
                                  | Analytics     |
                                  | (Kafka -->    |
                                  |  ClickHouse)  |
                                  +---------------+

Step 6: Database Choice

Option	Pros	Cons
NoSQL (DynamoDB, Cassandra)	Simple key-value lookup, horizontal scaling, high write throughput	Less flexible querying
SQL (PostgreSQL, MySQL)	ACID, flexible queries for analytics	Sharding complexity at scale

Recommended: NoSQL (DynamoDB or Cassandra) for the core URL mapping. Simple key-value access pattern fits perfectly.

Schema

Table: url_mappings
+-----------+-----------------------------------------+---------------------+-----+
| short_key | long_url                                | created_at          | ttl |
+-----------+-----------------------------------------+---------------------+-----+
| aB3xY7    | https://example.com/very/long/path     | 2024-01-15 10:30:00 | ... |
+-----------+-----------------------------------------+---------------------+-----+

Partition key: short_key (ensures even distribution)

Step 7: Redirect -- 301 vs 302

Code	Type	Behavior	Use When
301	Permanent Redirect	Browser caches, won't hit server again	Don't need analytics (less server load)
302	Temporary Redirect	Browser always hits server	Need analytics tracking (recommended)

Interview recommendation: Use 302 because:

Every click hits the server, enabling analytics
Allows changing the destination URL later
URL expiration works correctly

Step 8: Cache Strategy

Read path (redirect):

Client --> App Server --> [Check Redis Cache]
                              |
                    Hit?  YES --> Return long_url (< 1ms)
                              |
                          NO --> Query Database --> Store in cache --> Return

Cache eviction: LRU (Least Recently Used)
Cache size: 20% of daily traffic covers most hot URLs (Pareto principle)
Cache warm-up: Pre-load top URLs on deployment

Cache sizing:
  Daily redirects: ~330M
  Top 20% = ~66M URLs
  66M * 280 bytes = ~18 GB (fits in a Redis cluster)

Step 9: Analytics Tracking

Redirect flow with analytics:

Client --> App Server --> [302 redirect]
                |
                +--> Async: Publish click event to Kafka
                     {
                       "short_key": "aB3xY7",
                       "timestamp": "...",
                       "referrer": "twitter.com",
                       "user_agent": "...",
                       "ip": "...",
                       "country": "US"
                     }
                              |
                     Kafka --> ClickHouse/Druid (OLAP)
                              |
                     Dashboard / API for stats

Key principle: Never let analytics slow down the redirect. Fire-and-forget to Kafka.

Step 10: Handling Edge Cases

Collisions (Hash-Based)

Generate hash --> Check DB --> Exists?
  NO  --> Store and return
  YES --> Append salt/counter, rehash --> Check again
          (max 3 retries, then fall back to counter-based)

Custom Short URLs

Validate: length, allowed characters, not reserved words
Check uniqueness against existing keys
Store with a flag indicating "custom" (don't recycle)

Expiration

Store expires_at field
On read: check if expired, return 404 if so
Background job: periodically clean up expired entries
DynamoDB TTL handles this natively

Rate Limiting

Limit URL creation per user/IP (prevent abuse)
Limit redirects per short URL per second (prevent DDoS amplification)
See 28 - Design Rate Limiter

Deep Dive Topics

Deduplication

Should the same long URL always return the same short URL?

Approach	Pros	Cons
Deduplicate	Saves storage, consistent	Need secondary index on long_url, slower writes
Allow duplicates	Simpler, faster writes	More storage, different users get different short URLs

Recommendation: Allow duplicates unless storage is a critical concern. Simplicity wins.

Global Distribution

User (Europe) --> DNS (GeoDNS) --> CDN Edge (Europe)
                                     |
                             [Edge cache hit?]
                               YES --> redirect
                               NO  --> Origin (us-east)

GeoDNS routes to nearest edge
CDN caches 301 redirects naturally
For 302, use regional caches + replicated database

Interview Walkthrough Checklist

[ ] Clarify requirements (read vs write ratio, custom URLs, analytics)
[ ] Back-of-envelope estimation (QPS, storage, URL space)
[ ] API design (create, redirect, stats)
[ ] Encoding strategy (justify your choice)
[ ] High-level architecture diagram
[ ] Database choice (justify NoSQL for key-value)
[ ] Cache strategy (Redis, LRU, sizing)
[ ] 301 vs 302 trade-off
[ ] Analytics (async, Kafka)
[ ] Deep dive: collisions, expiration, rate limiting

Interview Tips

Start with requirements and estimation. Don't jump to the solution.
Justify encoding choice. Pre-generated keys are safest in an interview (no collision, no coordination).
302 over 301 if analytics matter. State the trade-off explicitly.
Cache is critical. A URL shortener is read-heavy. Show the math on cache sizing.
Analytics must be async. Never block the redirect for telemetry.
Mention rate limiting unprompted. Shows security awareness.

Resources

System Design Interview (Alex Xu): Chapter 8 - Design a URL Shortener
Grokking the System Design Interview: URL Shortening Service
Bitly Engineering Blog
Martin Kleppmann's talks on unique ID generation

Previous: 26 - Security in Distributed Systems | Next: 28 - Design Rate Limiter