27 - Design URL Shortener

Previous: 26 - Security in Distributed Systems | Next: 28 - Design Rate Limiter


Why This Matters in Interviews

The URL shortener is one of the most frequently asked system design problems. It appears simple but unfolds into discussions about hashing, encoding, database design, caching, analytics, and scale. Interviewers gauge your ability to start simple and layer in complexity.


Step 1: Requirements

Functional Requirements

  1. Given a long URL, generate a short URL
  2. Given a short URL, redirect to the original long URL
  3. (Optional) Custom short URLs
  4. (Optional) URL expiration (TTL)
  5. (Optional) Analytics (click count, referrer, geography)

Non-Functional Requirements

RequirementTarget
Availability99.99% (reads are critical)
LatencyRedirect in < 50ms (p99)
DurabilityURLs must not be lost
Scale100M new URLs/month, 10B redirects/month
Short URL lengthAs short as possible

Step 2: Estimation

Traffic

Writes: 100M URLs/month
  = ~40 URLs/sec (average)
  = ~400 URLs/sec (peak, 10x)

Reads: 10B redirects/month (100:1 read:write ratio)
  = ~4,000 QPS (average)
  = ~40,000 QPS (peak)

Storage

Per URL record:
  short_url:    7 bytes
  long_url:     ~200 bytes (average)
  created_at:   8 bytes
  expires_at:   8 bytes
  user_id:      8 bytes
  metadata:     ~50 bytes
  Total:        ~280 bytes

5 years of data:
  100M/month * 12 * 5 = 6B URLs
  6B * 280 bytes = ~1.7 TB

URL Space

With Base62 encoding (a-z, A-Z, 0-9):

6 characters: 62^6 = 56.8 billion (sufficient for years)
7 characters: 62^7 = 3.5 trillion (very safe)

Step 3: API Design

POST /api/v1/urls
  Body: { "long_url": "https://example.com/very/long/path",
          "custom_alias": "my-link",    // optional
          "expires_at": "2025-12-31" }  // optional
  Response: { "short_url": "https://tny.io/aB3xY7",
              "expires_at": "2025-12-31" }

GET /{short_url_key}
  Response: 301/302 redirect to long URL

GET /api/v1/urls/{short_url_key}/stats
  Response: { "clicks": 15234,
              "created_at": "2024-01-15",
              "top_referrers": [...],
              "clicks_by_country": {...} }

Step 4: URL Encoding Strategies

Option A: Hash + Base62

long_url --> MD5/SHA256 --> take first 43 bits --> Base62 encode --> 7-char key

Example:
  "https://example.com/path" --> MD5 --> "d41d8cd..." --> first 43 bits
  --> decimal: 8234567890 --> Base62 --> "aB3xY7k"

Collision handling:

  • Check database if key exists
  • If collision, append a counter or re-hash with salt
  • Bloom filter for fast existence check

Option B: Counter-Based (Pre-Generated IDs)

Auto-increment counter --> Base62 encode --> short URL key

Counter: 1000000001 --> Base62 --> "aB3xY7"
Counter: 1000000002 --> Base62 --> "aB3xY8"

Distributed counter options:

  • Single DB auto-increment: Simple but single point of failure
  • Range-based: Each server gets a range (Server 1: 1M-2M, Server 2: 2M-3M)
  • Snowflake-like ID: Timestamp + machine ID + sequence
  • ZooKeeper coordination: Servers lease ranges from ZooKeeper

Comparison

ApproachProsCons
Hash-basedNo coordination needed, same URL = same hashCollisions, longer computation
Counter-basedNo collisions, predictableNeeds coordination, guessable
Snowflake IDNo coordination, no collisionLonger IDs, not purely sequential
Pre-generated keysFast, no collision, no coordinationKey DB management overhead

Pre-Generated Key Service (Recommended for interviews)

+------------------+
| Key Generation   |
| Service (KGS)    |
|                  |
| [Unused Keys DB] |----> Generate millions of keys offline
| [Used Keys DB]   |      Move to "used" after assignment
+------------------+
        |
        | (batch of keys)
        v
+------------------+
| App Server 1     | --> Keeps ~1000 keys in memory
| App Server 2     | --> Keeps ~1000 keys in memory
| App Server N     | --> Keeps ~1000 keys in memory
+------------------+
  • Keys pre-generated and stored in a database
  • Each app server requests a batch (1000 keys at a time)
  • No coordination between app servers
  • If a server dies, those unused keys are lost (acceptable waste)

Step 5: System Architecture

                        +-------------------+
                        |   Load Balancer   |
                        +--------+----------+
                                 |
              +------------------+------------------+
              |                  |                  |
      +-------+------+  +-------+------+  +-------+------+
      | App Server 1 |  | App Server 2 |  | App Server N |
      | (key cache)  |  | (key cache)  |  | (key cache)  |
      +-------+------+  +-------+------+  +-------+------+
              |                  |                  |
              +--------+---------+--------+--------+
                       |                  |
               +-------+------+   +-------+-------+
               |    Cache     |   |    Database    |
               |   (Redis)    |   |   (Primary +   |
               | [key->url]   |   |    Replicas)   |
               +--------------+   +----------------+
                                          |
                                  +-------+-------+
                                  |  Key Gen      |
                                  |  Service      |
                                  +---------------+
                                          |
                                  +-------+-------+
                                  | Analytics     |
                                  | (Kafka -->    |
                                  |  ClickHouse)  |
                                  +---------------+

Step 6: Database Choice

OptionProsCons
NoSQL (DynamoDB, Cassandra)Simple key-value lookup, horizontal scaling, high write throughputLess flexible querying
SQL (PostgreSQL, MySQL)ACID, flexible queries for analyticsSharding complexity at scale

Recommended: NoSQL (DynamoDB or Cassandra) for the core URL mapping. Simple key-value access pattern fits perfectly.

Schema

Table: url_mappings
+-----------+-----------------------------------------+---------------------+-----+
| short_key | long_url                                | created_at          | ttl |
+-----------+-----------------------------------------+---------------------+-----+
| aB3xY7    | https://example.com/very/long/path     | 2024-01-15 10:30:00 | ... |
+-----------+-----------------------------------------+---------------------+-----+

Partition key: short_key (ensures even distribution)

Step 7: Redirect -- 301 vs 302

CodeTypeBehaviorUse When
301Permanent RedirectBrowser caches, won't hit server againDon't need analytics (less server load)
302Temporary RedirectBrowser always hits serverNeed analytics tracking (recommended)

Interview recommendation: Use 302 because:

  • Every click hits the server, enabling analytics
  • Allows changing the destination URL later
  • URL expiration works correctly

Step 8: Cache Strategy

Read path (redirect):

Client --> App Server --> [Check Redis Cache]
                              |
                    Hit?  YES --> Return long_url (< 1ms)
                              |
                          NO --> Query Database --> Store in cache --> Return
  • Cache eviction: LRU (Least Recently Used)
  • Cache size: 20% of daily traffic covers most hot URLs (Pareto principle)
  • Cache warm-up: Pre-load top URLs on deployment
Cache sizing:
  Daily redirects: ~330M
  Top 20% = ~66M URLs
  66M * 280 bytes = ~18 GB (fits in a Redis cluster)

Step 9: Analytics Tracking

Redirect flow with analytics:

Client --> App Server --> [302 redirect]
                |
                +--> Async: Publish click event to Kafka
                     {
                       "short_key": "aB3xY7",
                       "timestamp": "...",
                       "referrer": "twitter.com",
                       "user_agent": "...",
                       "ip": "...",
                       "country": "US"
                     }
                              |
                     Kafka --> ClickHouse/Druid (OLAP)
                              |
                     Dashboard / API for stats

Key principle: Never let analytics slow down the redirect. Fire-and-forget to Kafka.


Step 10: Handling Edge Cases

Collisions (Hash-Based)

Generate hash --> Check DB --> Exists?
  NO  --> Store and return
  YES --> Append salt/counter, rehash --> Check again
          (max 3 retries, then fall back to counter-based)

Custom Short URLs

  • Validate: length, allowed characters, not reserved words
  • Check uniqueness against existing keys
  • Store with a flag indicating "custom" (don't recycle)

Expiration

  • Store expires_at field
  • On read: check if expired, return 404 if so
  • Background job: periodically clean up expired entries
  • DynamoDB TTL handles this natively

Rate Limiting

  • Limit URL creation per user/IP (prevent abuse)
  • Limit redirects per short URL per second (prevent DDoS amplification)
  • See 28 - Design Rate Limiter

Deep Dive Topics

Deduplication

Should the same long URL always return the same short URL?

ApproachProsCons
DeduplicateSaves storage, consistentNeed secondary index on long_url, slower writes
Allow duplicatesSimpler, faster writesMore storage, different users get different short URLs

Recommendation: Allow duplicates unless storage is a critical concern. Simplicity wins.

Global Distribution

User (Europe) --> DNS (GeoDNS) --> CDN Edge (Europe)
                                     |
                             [Edge cache hit?]
                               YES --> redirect
                               NO  --> Origin (us-east)
  • GeoDNS routes to nearest edge
  • CDN caches 301 redirects naturally
  • For 302, use regional caches + replicated database

Interview Walkthrough Checklist

[ ] Clarify requirements (read vs write ratio, custom URLs, analytics)
[ ] Back-of-envelope estimation (QPS, storage, URL space)
[ ] API design (create, redirect, stats)
[ ] Encoding strategy (justify your choice)
[ ] High-level architecture diagram
[ ] Database choice (justify NoSQL for key-value)
[ ] Cache strategy (Redis, LRU, sizing)
[ ] 301 vs 302 trade-off
[ ] Analytics (async, Kafka)
[ ] Deep dive: collisions, expiration, rate limiting

Interview Tips

  1. Start with requirements and estimation. Don't jump to the solution.
  2. Justify encoding choice. Pre-generated keys are safest in an interview (no collision, no coordination).
  3. 302 over 301 if analytics matter. State the trade-off explicitly.
  4. Cache is critical. A URL shortener is read-heavy. Show the math on cache sizing.
  5. Analytics must be async. Never block the redirect for telemetry.
  6. Mention rate limiting unprompted. Shows security awareness.

Resources

  • System Design Interview (Alex Xu): Chapter 8 - Design a URL Shortener
  • Grokking the System Design Interview: URL Shortening Service
  • Bitly Engineering Blog
  • Martin Kleppmann's talks on unique ID generation

Previous: 26 - Security in Distributed Systems | Next: 28 - Design Rate Limiter