1️⃣ What is the Problem?

Imagine you have a URL Shortener / Cache / Database.

Initially only 1 server:

S1

As users grow, you add more servers:

S1   S2   S3

Now question:

How do we decide which server stores a user/key/data?


2️⃣ Normal Hashing (Simple Hash + Mod)

We use:

server = hash(key) % N

Where:

  • key = user ID / cache key / request ID
  • N = number of servers

Example:

User ID = 25

25 % 3 = 1

Store in:

S1

Works fine.


3️⃣ Problem with Normal Hashing

Now add one more server.

Before:

25 % 3 = 1 → S1

After adding S4:

25 % 4 = 1 → S1

Maybe okay for this key, but many keys change.

Example:

User ID = 14

Before:

14 % 3 = 2 → S2

After:

14 % 4 = 2 → S2

Try another:

User ID = 10

Before:

10 % 3 = 1 → S1

After:

10 % 4 = 2 → S2

Now data moves.

Big issue:

When adding/removing server → many keys move

That means:

  • heavy redistribution
  • more network traffic
  • downtime risk
  • poor scalability

4️⃣ Consistent Hashing Idea

Instead of hash(key) % N

We create a circular ring.

Servers are placed on ring.

Keys are also placed on ring.

Then assign key to nearest clockwise server.

Simple.


5️⃣ Easy Example

Assume ring:

0 → 100

Place servers:

S1 = 20
S2 = 50
S3 = 80

Ring:

        S1(20)
 
   S3(80)      S2(50)

(Circular)


6️⃣ Add Key

Suppose key = 35

Now move clockwise.

After 35, next server = 50

So:

35 → S2

Suppose key = 70

Clockwise next server = 80

So:

70 → S3

Suppose key = 90

Move clockwise.

Wrap around.

Next server = 20

So:

90 → S1

This wrap-around is very important.


7️⃣ Add New Server

Now add:

S4 = 65

Ring:

20(S1) → 50(S2) → 65(S4) → 80(S3)

Who moves?

Only keys between:

50 → 65

Earlier → S3
Now → S4

Only small part moves.

Good scalability ✅


8️⃣ Remove Server

Suppose S2 fails.

Earlier:

Keys between:

20 → 50

went to S2.

Now nearest clockwise server = S4 or S3.

Only S2’s keys move.

Not whole system.

Very efficient.


9️⃣ Why Better Than Modulo Hashing?

Normal hashing:

Add 1 server → almost all keys may change ❌

Consistent hashing:

Add/remove 1 server → only nearby keys move ✅

This is the biggest interview point.


🔟 Virtual Nodes (Very Important)

Problem:

If each server appears only once:

S1   S2   S3

Load may become uneven.

One server can get too much traffic.

Fix → Virtual Nodes

Same server placed multiple times.

Example:

S1 -> 10, 40, 70
S2 -> 20, 50, 90
S3 -> 30, 60, 80

Now data spreads more evenly.

Benefits:

  • better load balancing
  • less hotspot
  • fault tolerance
  • smoother scaling

Most interviewers like hearing virtual nodes.


1️⃣1️⃣ Where Used?

Use in distributed systems like:

  • Distributed Cache
  • Distributed DB
  • Sharding
  • Load Balancer
  • CDN
  • Message Queue

Common examples:

  • Redis
  • Apache Cassandra

1️⃣2️⃣ Interview Answer (Best Short Answer)

If interviewer asks “What is consistent hashing?”

Say:

Consistent hashing is a data distribution technique where both servers and keys are mapped on a circular ring. A key is assigned to the nearest clockwise server. When a server is added or removed, only a small subset of keys are remapped, which improves scalability and fault tolerance. Virtual nodes are used for better load balancing.


1️⃣3️⃣ Quick Comparison

FeatureModulo HashingConsistent Hashing
Add serverMany keys moveFew keys move
Remove serverMany keys moveFew keys move
RebalancingHighLow
ScalingPoorGood
Load balancingWeakBetter (VNodes)

1️⃣4️⃣ Easy Memory Trick 🧠

Remember:

  • Modulo Hashing → reshuffle almost everything
  • Consistent Hashing → move only nearby keys
  • Virtual Nodes → distribute load evenly