bitly.excalidraw
1️⃣ NFR - CAP Theorem Decision
Need Strong Consistency?
Required for Banking, Ticket Booking, and Trading Systems.
- Reason: Every read must see the latest write.
Bitly?
Not necessary. Scenario: Create URL → Immediately open URL → Fails for a few seconds. This is acceptable.
Decision
- CP ❌ / AP ✅
- Choose: High Availability, Eventual Consistency
2️⃣ 301 vs 302 Redirect
| Aspect | 301 Permanent Redirect | 302 Temporary Redirect |
|---|---|---|
| Meaning | Redirect is permanent | Redirect is temporary |
| Browser Caching | Client/browser cache the redirect | not cached |
| Future Requests | May bypass Bitly after caching | Always reach Bitly first |
| Server Traffic | Lower | Higher |
| Infrastructure Cost | Lower | Higher |
| Analytics Tracking | ❌ Limited / Lost after caching | ✅ Full tracking |
| Click Counting | ❌ Not reliable | ✅ Accurate |
| Monitoring & Observability | ❌ Harder | ✅ Easier |
| A/B Testing Support | ❌ Difficult | ✅ Easy |
| Common Use Case | Permanent URL migration | URL shorteners, tracking links |
| Bitly Interview Choice | ❌ Usually not preferred | ✅ Preferred |
3️⃣ Deep Dive #1 - Generating Unique Short Codes
Option 1: URL Prefix
Take first few characters.
- Example:
https://google.com→https: - Problem: Many URLs share same prefix (e.g.,
https://twitter.com,https://facebook.com), leading to a high collision rate. - Decision: ❌ Reject
Option 2: Random Number + Base62
Generate a random number and convert to Base62 (characters 0-9, A-Z, a-z, total 62 characters).
- Why Base62? A 6-character code yields
62^6 ≈ 56 Billioncombinations. - Problem: Birthday Paradox. Even with 56 billion combinations, storing
1 Billion URLscan still produce collisions. - Solution (Check-and-Put):
Generate Code→Check DB→Collision?→ If yes,Generate Again.
Option 3: Hash Long URL
SHA256(url)→Base62 Encode→ Take first6 chars.- Problem: Still has collisions.
- Solution: Same fix: check DB first.
Option 4: Counter (Recommended)
Use a counter (1, 2, 3, 4...) and convert using Base62.
- Example:
1→1,10→A,61→z,62→10. - Benefits: Fast, unique, no collision checks.
- Drawback: Predictable IDs. Competitors can guess links (e.g.,
bit.ly/1,bit.ly/2,bit.ly/3) and scrape them. - Solution: Use a bijective function with obfuscation:
Counter→Obfuscation→Base62(e.g., using libraries like Sqids to produce random-looking codes likefG8xT2orkL9pQ1).
4️⃣ Deep Dive #2 - Reducing Redirect Latency
Step 1: Database Indexing
Without index, querying 1 Billion rows requires a slow Full Scan.
- Solution: Set
PRIMARY KEY(short_url). - The database builds a B-Tree Index, reducing lookup time to
O(log N).
Step 2: Redis Cache
Flow: Client → Server → Redis → Database
Cache Structure
- Key: Short URL (e.g.,
abc123) - Value: Original URL (e.g.,
https://google.com)
Read-Through Cache Strategy
- Check Redis.
- If Hit: Return immediately.
- If Miss: Read DB → Update Cache → Return result.
- Eviction Policy:
LRU(Least Recently Used) to automatically remove old/inactive URLs.
Step 3: CDN (Optional)
Store redirects at edge locations (e.g., China User → China CDN instead of going to California Server).
- Pros: Faster global latency.
- Cons: Lose analytics visibility.
5️⃣ Deep Dive #3 - Scaling
Estimate Traffic
- Given:
100 Million DAU - Approx:
10^8 users - Seconds/day:
10^5 - RPS:
10^8 / 10^5 ≈ 10^3 ≈ 1000 RPS - Peak: multiply by 10 & 100
10k - 100k RPS
Scale Application Layer
- Vertical Scaling: Bigger server (More CPU/Memory).
- Horizontal Scaling (Preferred):
Client→Load Balancer→Server 1/2/3.
Split Services
Instead of one Monolith, use: API Gateway → Read Service & Write Service.
- Reason: Reads >> Writes (redirect traffic dominates).
Auto Scaling
Scale out automatically when CPU > 75% or Memory > 75%.
6️⃣ Scaling Counter Service
- Problem: Multiple Write Servers maintaining their own counter leads to collisions (e.g.,
Server A -> 10,Server B -> 10). - Solution: Global Counter (
Write Service→Redis CounterviaINCR). Since Redis is single-threaded, it guarantees unique sequential IDs. - Optimization (Batching): Allocate batches of IDs to servers (e.g.,
Server Agets1-1000,Server Bgets1001-2000) to reduce Redis calls. As we can afford to loose some count as total allowed is62^6 ≈ 56 Billion ≈ 10^10
7️⃣ Database Scaling
- Storage Estimate:
~500 bytes per row*1 Billion URLs=10^11 bytes/10^6 byte per gb=500 GB. - Verdict: A single DB is enough since modern SSDs support multiple TBs.
- Sharding (If needed): Shard by
short_url(e.g.,hash(short_url) % N) to route requests to the correct shard.
8️⃣ High Availability
- Redis Cache Failure: Non-critical. Cache miss leads to DB query which rebuilds the cache.
- Global Counter Failure: Critical. Requires replication, snapshots, and failover.
- Database Failure: Use a
Primary DB→Replica DBsetup with hourly snapshots to S3. Recovery is done via replica promotion or restoring a snapshot.
Final Architecture
+----------------+
| API Gateway |
+----------------+
/ \
/ \
v v
+---------------+ +---------------+
| Read Service | | Write Service |
+---------------+ +---------------+
| |
| |
v v
+---------+ +------------+
| Redis | | Global |
| Cache | | Counter |
+---------+ +------------+
| |
+--------+--------+
|
v
+-------------+
| PostgreSQL |
+-------------+
|
v
Replica + BackupInterview Summary
Functional Requirements
✅ Create short URL
✅ Redirect users
✅ Custom Alias
✅ Expiration Time
Non-Functional Requirements
✅ Low Latency (<200ms)
✅ Scalability (100M DAU)
✅ Unique Short Codes
✅ High Availability
✅ Eventual Consistency
Key Technologies
- PostgreSQL
- Redis Cache
- Redis Global Counter (
INCR) - Base62 Encoding
- B-Tree Index
- API Gateway
- Horizontal Scaling
- Read/Write Service Separation
- Database Replication
- Snapshots/Backups