0️⃣ HLD

dropbox.excalidraw

1️⃣ Upload Flow

Initial Approach (Naive)

Client ──> File Service ──> S3
  • Problem: File uploaded twice (Client → File Service and File Service → S3).
  • Waste of: Bandwidth, CPU, and Network resources.

Better Approach

Instead of transferring the file through the server, we use presigned URLs.

  1. Step 1: Client sends metadata only (Client → File Service).
  2. Step 2: File Service requests a Presigned URL (File Service → S3).
  3. Step 3: S3 returns the Presigned URL, which the service sends back to the client.
  4. Step 4: Client uploads the file directly to S3 (Client → S3).

Benefits

  • ✅ No double upload
  • ✅ Less server load
  • ✅ Faster uploads

2️⃣ Why Chunking Is Needed

  • Problem: Uploading a large file (e.g., 50 GB upload at 100 Mbps takes ~72 minutes). If the internet disconnects, the client has to start from the beginning ❌.
  • Solution (Chunking): Split the file into small, manageable pieces (e.g., 5 MB chunks).
    • Example: A 50 GB File becomes Chunk 1, Chunk 2, Chunk 3 ... Chunk N.
    • The client uploads chunks independently.

3️⃣ File Metadata With Chunks

{
  "fileId": "string",
  "chunks": [
    {
      "chunkId": "string",
      "status": "string",
      "s3Link": "string"
    }
  ]
}

4️⃣ Resumable Uploads

If the upload stops, the client can resume by comparing already uploaded chunks against local chunks:

  • Already Uploaded: Chunk 1, Chunk 2, Chunk 3
  • Missing: Chunk 4, Chunk 5

The upload resumes from Chunk 4 instead of starting from the beginning.


5️⃣ Fingerprinting

  • Problem: How do we uniquely identify chunks?
  • Solution: Hash(chunk bytes). For example, SHA-256(chunk) produces a unique fingerprint used as the chunkId.
  • Benefits:
    • Detect duplicate chunks (de-duplication)
    • Resume uploads easily
    • Verify chunk/file integrity

6️⃣ Trust But Verify Pattern

  • Problem: The client claims: “Chunk uploaded successfully.” Can we trust the client? No.
  • Verification Flow:
    1. Client notifies the Metadata Service that the chunk is uploaded.
    2. Metadata Service queries S3 to check: Does the chunk really exist?
    3. If yes, update the chunk status in the database.

7️⃣ Alternative: S3 Notifications

Instead of the client telling the server, we use event notifications:

Chunk Uploaded to S3 ──> S3 Notification Event ──> Metadata Service
  • Advantages: More reliable and removes client dependency.
  • Tradeoff: Increases architectural complexity.

8️⃣ Download Flow

  1. Client queries the Metadata DB to get the S3 Link for the file chunks.
  2. Client downloads chunks directly from S3.
  • Why Direct Download? Avoids routing bytes through our server (S3 → File Service → Client), which saves an extra network hop and server bandwidth.

9️⃣ Optimizing Upload Speed

A. Parallel Chunk Uploads

Instead of uploading chunks sequentially (Chunk 1Chunk 2Chunk 3), upload multiple chunks in parallel.

  • Benefits: Better bandwidth utilization and faster uploads.

B. Compression

Compress files on the client-side before uploading.

  • Good Candidates (Highly compressible): Text files, DOCX, CSV.
  • Poor Candidates (Already compressed): JPEG, PNG, MP4.
  • Note: Store the compressionAlgorithm inside the metadata so the client knows how to decompress it.

C. CDN (Optional)

Use ClientCDNS3 upload/download paths.

  • Use Case: Mostly useful for shared public files where many users download the same file.
  • Limitation: Not very useful when users mostly access private, unique files. They will be anyways getting uploaded to nearby S3 (mumbai for pune users)

🔟 Client Architecture

+-------------------+
| Client App        |
+-------------------+
| Local Folder      |
| Local DB          |
+-------------------+
  • Local Folder: The designated directory (e.g., “Dropbox Folder”) storing the actual files.
  • Local DB: A lightweight client-side DB (e.g., SQLite) storing metadata (fileId, hash, timestamps, chunk info).
    • Used For: Sync decisions, duplicate detection, and reconciliation.

1️⃣1️⃣ Detecting Local Changes

The OS provides file watcher APIs to notify the app when files change:

  • Windows: FileSystemWatcher
  • MacOS: FSEvents

Change Detection Flow:

Local File Changed ──> File Watcher ──> Trigger Upload

1️⃣2️⃣ Sync Design (Remote → Local)

Polling

The client periodically queries GET /changes to retrieve a list of changed files and downloads the updates.

  • Why Polling?
    • Simple, reliable, and easy to scale.
  • Why not WebSockets?
    • WebSockets require persistent connections, adding overhead like connection management and stateful servers. For a file-sync service like Dropbox, standard HTTP polling is typically sufficient and easier to scale.

1️⃣3️⃣ Adaptive Polling

Adjust polling frequency dynamically based on user activity:

  • High Activity: APP Opened - Poll more frequently. 10 sec
  • Idle State: Poll less frequently.
  • Benefits: Lowers server load and provides a faster sync experience when the user is actively working.

Also we can create a Refresh Button on Local App


1️⃣4️⃣ Delta Sync - When File Edited

  • Problem: Without delta sync, changing a tiny portion (e.g., 1 KB) of a large file (e.g., 50 GB) would require re-downloading the entire file ❌.
  • Solution: Sync only the modified chunks.
    • Example: For a file with Chunk 1, Chunk 2, Chunk 3 (changed), and Chunk 4, the client only downloads/uploads Chunk 3.
  • Benefits: Faster syncing and minimal bandwidth usage.

1️⃣5️⃣ Consistency Options

Option 1: Poll Metadata DB (Client → Metadata DB)

Query changes since a specific timestamp.

  • Pros: Simple and easy to implement.
  • Cons: High read load on the database.

Option 2: Event Bus + Cursor (e.g., Kafka)

File Changed Service ──> Kafka ──> Client (via syncCursor)
  • Pros: Full audit trail, version history, ability to replay events, and rollback support.
  • Cons: High complexity and requires more infrastructure.
  • Verdict: Start with polling the DB as the simpler solution, and present the Kafka/Event Bus model as the advanced optimization tradeoff.

1️⃣6️⃣ Reconciliation

  • Problem: Despite real-time sync, divergence can occur (Local State != Remote State).
  • Solution: Run a periodic reconciliation task (e.g., daily or weekly) to compare Local DB vs Remote Metadata.
  • Process: Compare file fingerprints, chunk IDs, and metadata, then resolve mismatches.
  • Purpose: Guarantees high data integrity over the long term.

1️⃣7️⃣ Interview Deep Dives

  • Mid-Level (L4 / E4): Focus on Chunking and basic Syncing mechanisms.
  • Senior (L5): Discuss Chunking, Delta Sync, Presigned URLs, and Reconciliation.
  • Staff (L6+): Additionally cover Event-driven sync, Kafka cursor model, Multi-region architecture, Conflict resolution, and Storage optimization.

1️⃣8️⃣ Core Features Enabled:

  • ✅ Presigned URLs
  • ✅ Chunking & Fingerprinting
  • ✅ Resumable Uploads & Delta Sync
  • ✅ Adaptive Polling & Reconciliation
  • ✅ High Availability