1. Requirements
whatsapp-req.excalidraw
WIP on below.. 👷🏻♂️
whatsapp-hdl.excalidraw
2. Core Entities
User
└── userId
Client(Device)
├── clientId
└── userId
Chat
├── chatId
└── metadata
ChatParticipant
├── chatId
└── userId
Message
├── messageId
├── chatId
├── senderId
├── content
└── timestamp
Inbox
├── recipientId/clientId
└── messageIdWhy Client Entity?
User
├── Mobile
├── Laptop
└── TabletA user may have multiple devices.
3. API Design
Use:
WebSocketsReason:
Bidirectional
Low Latency
Persistent ConnectionClient → Server Commands
Create Chat
{
"participants": [],
"name": ""
}Response
{
"chatId": ""
}Send Message
{
"chatId": "",
"message": "",
"attachments": []
}Response
{
"status": "SUCCESS",
"messageId": ""
}Upload Attachment
{
"body": "...",
"hash": "..."
}Modify Participants
{
"chatId": "",
"userId": "",
"operation": "ADD | REMOVE"
}ACK Message
{
"messageId": ""
}Used to confirm delivery.
Server → Client Commands
New Message
{
"chatId": "",
"senderId": "",
"message": ""
}Chat Updated
{
"chatId": "",
"participants": []
}4. High Level Design (HLD)
Step 1 — Create Chat
Components
Client
│
WebSocket
│
Chat Service
│
DynamoDBChat Table
PK = chatIdchatId
name
metadataChatParticipant Table
PK = chatId
SK = participantIdQuery:
Get participants of chatGSI
PK = participantId
SK = chatIdQuery:
Get all chats for userStep 2 — Send / Receive Messages
Single Server Version
Client
│
WebSocket
│
Chat ServerIn-Memory Connection Map
unordered_map<
userId,
websocketConnection
>Flow
Send Message
│
Find Participants
│
Find WebSocket
│
Push MessageWorks only when everyone is online.
Step 3 — Offline Messages
Need persistence.
Message Table
messageId
chatId
senderId
content
timestampInbox Table
recipientId
messageIdPurpose:
Track undelivered messagesSend Flow
Sender
│
Send Message
│
Chat Service
│
Write Message
│
Create Inbox Entry
│
Deliver MessageACK Flow
Client Receives Message
│
ACK
│
Delete Inbox EntryReconnect Flow
Client Connects
│
Read Inbox
│
Read Messages
│
Deliver
│
ACKStep 4 — Media Attachments
Bad
Client
│
Video
│
DatabaseBetter
Client
│
Video
│
Chat Server
│
S3Best
Client
│
Request Upload URL
│
Chat Server
│
Pre-Signed URL
│
S3Flow
1. Get URL
2. Upload directly to S3
3. Receive URL
4. Send URL inside messageFinal HLD
L4 Load Balancer
│
┌──────────────────┼──────────────────┐
│ │ │
Chat Server 1 Chat Server 2 Chat Server 3
│ │ │
└──────────────┬───┴───┬──────────────┘
│
Redis PubSub
│
DynamoDB
│
┌────────────────┴──────────────┐
│ │
Chat Tables Message Tables
│ │
└──────────────┬────────────────┘
│
S35. Deep Dives
Deep Dive 1 — Scaling Chat Servers
Problem
User A → Server 1
User B → Server 2Server 1 cannot directly access B’s websocket.
Solution A — Consistent Hashing
hash(userId)
│
▼
Chat ServerPros
Predictable RoutingCons
Complex RebalancingSolution B — Redis Pub/Sub (Preferred)
Subscription
user123
user456
user789Each server subscribes to connected users.
Flow
Server 1
│
Publish(userB)
│
Redis
│
Server 2
│
WebSocket
│
User BWhy Not Kafka?
Need:
Topic per UserNot feasible for billions of users.
Redis channels are lightweight.
Deep Dive 2 — Redis Reliability
Redis Pub/Sub provides:
At Most Once DeliveryMessage may be lost.
Why Still Safe?
Redis = Fast Path
Inbox Table = Reliable PathMessage already exists in DB.
Deep Dive 3 — WebSocket Failure
Bad
Rely on TCP TimeoutMay take minutes.
Better
ACK Timeout
Message Sent
│
No ACK
│
RetryBest
Heartbeats
PING
PONGevery few seconds.
Deep Dive 4 — Lost Redis Messages
Solution 1
Polling
Check Inbox Every N SecondsSolution 2
Sequence Numbers
101
102
104Missing:
103Fetch from DB.
Best
Heartbeat
+
Sequence NumbersDeep Dive 5 — Multi Device Support
Client Table
clientId
userIdInbox Change
Before
recipientIdAfter
recipientClientIdDelivery
User
├── Mobile
├── Laptop
└── TabletSend to every device.
Deep Dive 6 — Message Ordering
Distributed systems cannot guarantee perfect ordering.
Solution
All servers sync via:
NTPOn Ingestion
Server Timestampadded.
Client
ORDER BY timestampUsers may occasionally see messages reorder.
Acceptable tradeoff.
Deep Dive 7 — Presence / Last Seen
Presence Table
userId
status
lastSeenConnected
ONLINEDisconnected
lastSeen = disconnect timeReal-Time Updates
Reuse:
Redis Pub/Subfor online/offline notifications.
Key Interview Takeaways
WebSockets
↓
Chat + Participant Tables
↓
Message + Inbox Tables
↓
ACK Mechanism
↓
S3 + Pre-Signed URLs
↓
Multiple Chat Servers
↓
Redis Pub/Sub
↓
Heartbeats
↓
Sequence Numbers
↓
Multi Device Support
↓
Presence / Last Seen