1️⃣ What is the Problem?
Imagine you have a URL Shortener / Cache / Database.
Initially only 1 server:
S1As users grow, you add more servers:
S1 S2 S3Now question:
How do we decide which server stores a user/key/data?
2️⃣ Normal Hashing (Simple Hash + Mod)
We use:
server = hash(key) % NWhere:
key= user ID / cache key / request IDN= number of servers
Example:
User ID = 25
25 % 3 = 1Store in:
S1Works fine.
3️⃣ Problem with Normal Hashing
Now add one more server.
Before:
25 % 3 = 1 → S1After adding S4:
25 % 4 = 1 → S1Maybe okay for this key, but many keys change.
Example:
User ID = 14
Before:
14 % 3 = 2 → S2After:
14 % 4 = 2 → S2Try another:
User ID = 10
Before:
10 % 3 = 1 → S1After:
10 % 4 = 2 → S2Now data moves.
Big issue:
When adding/removing server → many keys move
That means:
- heavy redistribution
- more network traffic
- downtime risk
- poor scalability
4️⃣ Consistent Hashing Idea
Instead of hash(key) % N
We create a circular ring.
Servers are placed on ring.
Keys are also placed on ring.
Then assign key to nearest clockwise server.
Simple.
5️⃣ Easy Example
Assume ring:
0 → 100Place servers:
S1 = 20
S2 = 50
S3 = 80Ring:
S1(20)
S3(80) S2(50)(Circular)
6️⃣ Add Key
Suppose key = 35
Now move clockwise.
After 35, next server = 50
So:
35 → S2Suppose key = 70
Clockwise next server = 80
So:
70 → S3Suppose key = 90
Move clockwise.
Wrap around.
Next server = 20
So:
90 → S1This wrap-around is very important.
7️⃣ Add New Server
Now add:
S4 = 65Ring:
20(S1) → 50(S2) → 65(S4) → 80(S3)Who moves?
Only keys between:
50 → 65Earlier → S3
Now → S4
Only small part moves.
Good scalability ✅
8️⃣ Remove Server
Suppose S2 fails.
Earlier:
Keys between:
20 → 50went to S2.
Now nearest clockwise server = S4 or S3.
Only S2’s keys move.
Not whole system.
Very efficient.
9️⃣ Why Better Than Modulo Hashing?
Normal hashing:
Add 1 server → almost all keys may change ❌
Consistent hashing:
Add/remove 1 server → only nearby keys move ✅
This is the biggest interview point.
🔟 Virtual Nodes (Very Important)
Problem:
If each server appears only once:
S1 S2 S3Load may become uneven.
One server can get too much traffic.
Fix → Virtual Nodes
Same server placed multiple times.
Example:
S1 -> 10, 40, 70
S2 -> 20, 50, 90
S3 -> 30, 60, 80Now data spreads more evenly.
Benefits:
- better load balancing
- less hotspot
- fault tolerance
- smoother scaling
Most interviewers like hearing virtual nodes.
1️⃣1️⃣ Where Used?
Use in distributed systems like:
- Distributed Cache
- Distributed DB
- Sharding
- Load Balancer
- CDN
- Message Queue
Common examples:
- Redis
- Apache Cassandra
1️⃣2️⃣ Interview Answer (Best Short Answer)
If interviewer asks “What is consistent hashing?”
Say:
Consistent hashing is a data distribution technique where both servers and keys are mapped on a circular ring. A key is assigned to the nearest clockwise server. When a server is added or removed, only a small subset of keys are remapped, which improves scalability and fault tolerance. Virtual nodes are used for better load balancing.
1️⃣3️⃣ Quick Comparison
| Feature | Modulo Hashing | Consistent Hashing |
|---|---|---|
| Add server | Many keys move | Few keys move |
| Remove server | Many keys move | Few keys move |
| Rebalancing | High | Low |
| Scaling | Poor | Good |
| Load balancing | Weak | Better (VNodes) |
1️⃣4️⃣ Easy Memory Trick 🧠
Remember:
- Modulo Hashing → reshuffle almost everything
- Consistent Hashing → move only nearby keys
- Virtual Nodes → distribute load evenly