0️⃣ HLD
dropbox.excalidraw
1️⃣ Upload Flow
Initial Approach (Naive)
Client ──> File Service ──> S3- Problem: File uploaded twice (
Client → File ServiceandFile Service → S3). - Waste of: Bandwidth, CPU, and Network resources.
Better Approach
Instead of transferring the file through the server, we use presigned URLs.
- Step 1: Client sends metadata only (
Client → File Service). - Step 2: File Service requests a Presigned URL (
File Service → S3). - Step 3: S3 returns the Presigned URL, which the service sends back to the client.
- Step 4: Client uploads the file directly to S3 (
Client → S3).
Benefits
- ✅ No double upload
- ✅ Less server load
- ✅ Faster uploads
2️⃣ Why Chunking Is Needed
- Problem: Uploading a large file (e.g.,
50 GB uploadat100 Mbpstakes~72 minutes). If the internet disconnects, the client has to start from the beginning ❌. - Solution (Chunking): Split the file into small, manageable pieces (e.g.,
5 MB chunks).- Example: A
50 GB FilebecomesChunk 1, Chunk 2, Chunk 3 ... Chunk N. - The client uploads chunks independently.
- Example: A
3️⃣ File Metadata With Chunks
{
"fileId": "string",
"chunks": [
{
"chunkId": "string",
"status": "string",
"s3Link": "string"
}
]
}4️⃣ Resumable Uploads
If the upload stops, the client can resume by comparing already uploaded chunks against local chunks:
- Already Uploaded:
Chunk 1,Chunk 2,Chunk 3 - Missing:
Chunk 4,Chunk 5
The upload resumes from Chunk 4 instead of starting from the beginning.
5️⃣ Fingerprinting
- Problem: How do we uniquely identify chunks?
- Solution:
Hash(chunk bytes). For example,SHA-256(chunk)produces a unique fingerprint used as thechunkId. - Benefits:
- Detect duplicate chunks (de-duplication)
- Resume uploads easily
- Verify chunk/file integrity
6️⃣ Trust But Verify Pattern
- Problem: The client claims: “Chunk uploaded successfully.” Can we trust the client? No.
- Verification Flow:
- Client notifies the
Metadata Servicethat the chunk is uploaded. Metadata ServicequeriesS3to check: Does the chunk really exist?- If yes, update the chunk status in the database.
- Client notifies the
7️⃣ Alternative: S3 Notifications
Instead of the client telling the server, we use event notifications:
Chunk Uploaded to S3 ──> S3 Notification Event ──> Metadata Service- Advantages: More reliable and removes client dependency.
- Tradeoff: Increases architectural complexity.
8️⃣ Download Flow
- Client queries the
Metadata DBto get theS3 Linkfor the file chunks. - Client downloads chunks directly from
S3.
- Why Direct Download? Avoids routing bytes through our server (
S3 → File Service → Client), which saves an extra network hop and server bandwidth.
9️⃣ Optimizing Upload Speed
A. Parallel Chunk Uploads
Instead of uploading chunks sequentially (Chunk 1 → Chunk 2 → Chunk 3), upload multiple chunks in parallel.
- Benefits: Better bandwidth utilization and faster uploads.
B. Compression
Compress files on the client-side before uploading.
- Good Candidates (Highly compressible): Text files,
DOCX,CSV. - Poor Candidates (Already compressed):
JPEG,PNG,MP4. - Note: Store the
compressionAlgorithminside the metadata so the client knows how to decompress it.
C. CDN (Optional)
Use Client → CDN → S3 upload/download paths.
- Use Case: Mostly useful for shared public files where many users download the same file.
- Limitation: Not very useful when users mostly access private, unique files. They will be anyways getting uploaded to nearby S3 (mumbai for pune users)
🔟 Client Architecture
+-------------------+
| Client App |
+-------------------+
| Local Folder |
| Local DB |
+-------------------+- Local Folder: The designated directory (e.g., “Dropbox Folder”) storing the actual files.
- Local DB: A lightweight client-side DB (e.g., SQLite) storing metadata (
fileId,hash,timestamps,chunk info).- Used For: Sync decisions, duplicate detection, and reconciliation.
1️⃣1️⃣ Detecting Local Changes
The OS provides file watcher APIs to notify the app when files change:
- Windows:
FileSystemWatcher - MacOS:
FSEvents
Change Detection Flow:
Local File Changed ──> File Watcher ──> Trigger Upload1️⃣2️⃣ Sync Design (Remote → Local)
Polling
The client periodically queries GET /changes to retrieve a list of changed files and downloads the updates.
- Why Polling?
- Simple, reliable, and easy to scale.
- Why not WebSockets?
- WebSockets require persistent connections, adding overhead like connection management and stateful servers. For a file-sync service like Dropbox, standard HTTP polling is typically sufficient and easier to scale.
1️⃣3️⃣ Adaptive Polling
Adjust polling frequency dynamically based on user activity:
- High Activity: APP Opened - Poll more frequently.
10 sec - Idle State: Poll less frequently.
- Benefits: Lowers server load and provides a faster sync experience when the user is actively working.
Also we can create a Refresh Button on Local App
1️⃣4️⃣ Delta Sync - When File Edited
- Problem: Without delta sync, changing a tiny portion (e.g.,
1 KB) of a large file (e.g.,50 GB) would require re-downloading the entire file ❌. - Solution: Sync only the modified chunks.
- Example: For a file with
Chunk 1,Chunk 2,Chunk 3 (changed), andChunk 4, the client only downloads/uploadsChunk 3.
- Example: For a file with
- Benefits: Faster syncing and minimal bandwidth usage.
1️⃣5️⃣ Consistency Options
Option 1: Poll Metadata DB (Client → Metadata DB)
Query changes since a specific timestamp.
- Pros: Simple and easy to implement.
- Cons: High read load on the database.
Option 2: Event Bus + Cursor (e.g., Kafka)
File Changed Service ──> Kafka ──> Client (via syncCursor)- Pros: Full audit trail, version history, ability to replay events, and rollback support.
- Cons: High complexity and requires more infrastructure.
- Verdict: Start with polling the DB as the simpler solution, and present the Kafka/Event Bus model as the advanced optimization tradeoff.
1️⃣6️⃣ Reconciliation
- Problem: Despite real-time sync, divergence can occur (
Local State != Remote State). - Solution: Run a periodic reconciliation task (e.g., daily or weekly) to compare
Local DBvsRemote Metadata. - Process: Compare file fingerprints, chunk IDs, and metadata, then resolve mismatches.
- Purpose: Guarantees high data integrity over the long term.
1️⃣7️⃣ Interview Deep Dives
- Mid-Level (L4 / E4): Focus on Chunking and basic Syncing mechanisms.
- Senior (L5): Discuss Chunking, Delta Sync, Presigned URLs, and Reconciliation.
- Staff (L6+): Additionally cover Event-driven sync, Kafka cursor model, Multi-region architecture, Conflict resolution, and Storage optimization.
1️⃣8️⃣ Core Features Enabled:
- ✅ Presigned URLs
- ✅ Chunking & Fingerprinting
- ✅ Resumable Uploads & Delta Sync
- ✅ Adaptive Polling & Reconciliation
- ✅ High Availability