⭐ Interview Intro

β€œUser content contains large files, so I would separate metadata and actual file storage. Metadata like user ID and file URL goes into the database, while images/videos go into object storage such as S3. Uploads happen through pre-signed URLs to reduce backend load, and large uploads use multipart upload. Downloads are served through CDN for low latency.”

🎯 Interview Keywords

βœ… Metadata in DB - Metadata -> SQL/NoSQL DB & Files -> Object Storage βœ… Pre-Signed URL - Avoid routing files through backend to upload & download. βœ… Multipart Upload - large files can be uploaded in chunk. βœ… CDN in front of storage - Usually object storage is fronted by CDN.

Client β†’ CDN β†’ Object Storage

πŸ“Œ Object Storage (Blob Storage)

Storage system for large binary files (BLOBs). Examples:

  • Images
  • Videos
  • PDFs
  • Logs
  • ML datasets
  • Static assets

Core Idea

Metadata β†’ DB
Files β†’ Object Storage

Example:

Post:
- post_id      β†’ DB
- caption      β†’ DB
- image_url    β†’ DB
- actual image β†’ S3

Metadata Service β†’ stores where the file exists


πŸš€ Pre-Signed URL

Avoid sending files through backend. ❌

Client β†’ Backend β†’ Get URL
Client β†’ S3 (Direct Upload)

Backend returns:

  • Temporary permission
  • Expiry
  • Allowed operation

Benefits:

  • Lower backend load
  • Better scalability
  • Faster upload/download

πŸ“¦ Multipart Upload

Used for large files

10 GB
β†’ Chunk
β†’ Upload chunks
β†’ S3 merges

Benefits:

  • Parallel upload
  • Resume on failure
  • Reliable large transfers

☁️ Popular Services

CloudService
AWSAmazon S3
GCPGoogle Cloud Storage
AzureAzure Blob Storage

βš™οΈ Why Object Storage Scales Well

1. Flat Namespace

No real folders. β†’ Stored internally as single unique key

user123/profile/image.jpg

Benefits:

  • Fast lookup
  • Easy scaling

2. Immutable Writes

Objects are not updated in-place.

Allowed:

  • Create new version
  • Overwrite entire object

Benefits:

  • No locking
  • No race conditions
  • Simpler distributed systems

3. Replication / Erasure Coding

Data stored across:

  • Multiple servers
  • Multiple racks
  • Multiple DCs Result:
11 nines durability
99.999999999%