1. Requirements
whatsapp-req.excalidraw
WIP on below.. π·π»ββοΈ
whatsapp-hdl.excalidraw
2. Core Entities
User
βββ userId
Client(Device)
βββ clientId
βββ userId
Chat
βββ chatId
βββ metadata
ChatParticipant
βββ chatId
βββ userId
Message
βββ messageId
βββ chatId
βββ senderId
βββ content
βββ timestamp
Inbox
βββ recipientId/clientId
βββ messageIdWhy Client Entity?
User
βββ Mobile
βββ Laptop
βββ TabletA user may have multiple devices.
3. API Design
Use:
WebSocketsReason:
Bidirectional
Low Latency
Persistent ConnectionClient β Server Commands
Create Chat
{
"participants": [],
"name": ""
}Response
{
"chatId": ""
}Send Message
{
"chatId": "",
"message": "",
"attachments": []
}Response
{
"status": "SUCCESS",
"messageId": ""
}Upload Attachment
{
"body": "...",
"hash": "..."
}Modify Participants
{
"chatId": "",
"userId": "",
"operation": "ADD | REMOVE"
}ACK Message
{
"messageId": ""
}Used to confirm delivery.
Server β Client Commands
New Message
{
"chatId": "",
"senderId": "",
"message": ""
}Chat Updated
{
"chatId": "",
"participants": []
}4. High Level Design (HLD)
Step 1 β Create Chat
Components
Client
β
WebSocket
β
Chat Service
β
DynamoDBChat Table
PK = chatIdchatId
name
metadataChatParticipant Table
PK = chatId
SK = participantIdQuery:
Get participants of chatGSI
PK = participantId
SK = chatIdQuery:
Get all chats for userStep 2 β Send / Receive Messages
Single Server Version
Client
β
WebSocket
β
Chat ServerIn-Memory Connection Map
unordered_map<
userId,
websocketConnection
>Flow
Send Message
β
Find Participants
β
Find WebSocket
β
Push MessageWorks only when everyone is online.
Step 3 β Offline Messages
Need persistence.
Message Table
messageId
chatId
senderId
content
timestampInbox Table
recipientId
messageIdPurpose:
Track undelivered messagesSend Flow
Sender
β
Send Message
β
Chat Service
β
Write Message
β
Create Inbox Entry
β
Deliver MessageACK Flow
Client Receives Message
β
ACK
β
Delete Inbox EntryReconnect Flow
Client Connects
β
Read Inbox
β
Read Messages
β
Deliver
β
ACKStep 4 β Media Attachments
Bad
Client
β
Video
β
DatabaseBetter
Client
β
Video
β
Chat Server
β
S3Best
Client
β
Request Upload URL
β
Chat Server
β
Pre-Signed URL
β
S3Flow
1. Get URL
2. Upload directly to S3
3. Receive URL
4. Send URL inside message5. Deep Dives
Deep Dive 1 β Scaling Chat Servers
Load balancer
βThe important requirement is maintaining long-lived TCP connections for WebSockets. A Layer-4 load balancer forwards the TCP connection to a chat server and keeps that connection pinned to the same server for its lifetime. We also donβt need any Layer-7 features like path-based routing or HTTP inspection, so an NLB is simpler and has lower overhead. Although modern Layer-7 load balancers support WebSockets, a Layer-4 load balancer is sufficient and generally a better fit for this architecture.β
Problem
User A β Server 1
User B β Server 2Server 1 cannot directly access Bβs websocket.
Solution A β Consistent Hashing
hash(userId)
β
βΌ
Chat ServerPros
Predictable RoutingCons
Complex RebalancingSolution B β Redis Pub/Sub (Preferred)
Subscription
user123
user456
user789Each server subscribes to connected users.
Flow
Server 1
β
Publish(userB)
β
Redis
β
Server 2
β
WebSocket
β
User BWhy Not Kafka?
Need:
Topic per UserNot feasible for billions of users/topic Redis channels are lightweight.
Deep Dive 2 β Redis Reliability
Redis Pub/Sub provides:
At Most Once DeliveryMessage may be lost.
Why Still Safe?
Redis = Fast Path
Inbox Table = Reliable PathMessage already exists in DB.
Deep Dive 3 β WebSocket Failure
Bad
Rely on TCP TimeoutMay take minutes.
Better
ACK Timeout
Message Sent
β
No ACK
β
RetryBest
Heartbeats
PING
PONGevery few seconds.
Deep Dive 4 β Lost Redis Messages
Solution 1
Polling
Check Inbox Every N SecondsSolution 2
Sequence Numbers
101
102
104Missing:
103Fetch from DB.
Best
Heartbeat
+
Sequence NumbersDeep Dive 5 β Multi Device Support
Client Table
clientId
userIdInbox Change
Before
recipientIdAfter
recipientClientIdDelivery
User
βββ Mobile
βββ Laptop
βββ TabletSend to every device.
Deep Dive 6 β Message Ordering
Distributed systems cannot guarantee perfect ordering.
Solution
All servers sync via:
NTPOn Ingestion
Server Timestampadded.
Client
ORDER BY timestampUsers may occasionally see messages reorder.
Acceptable tradeoff.
Deep Dive 7 β Presence / Last Seen
Presence Table
userId
status
lastSeenConnected
ONLINEDisconnected
lastSeen = disconnect timeReal-Time Updates
Reuse:
Redis Pub/Subfor online/offline notifications.
Key Interview Takeaways
WebSockets
β
Chat + Participant Tables
β
Message + Inbox Tables
β
ACK Mechanism
β
S3 + Pre-Signed URLs
β
Multiple Chat Servers
β
Redis Pub/Sub
β
Heartbeats
β
Sequence Numbers
β
Multi Device Support
β
Presence / Last Seen