Requirements

leetcode-req.excalidraw


HLD

leetcode-hld.excalidraw


Deep Dive

leetcode-deep-dive.excalidraw


πŸ”₯ Deep Dive 1: Sandbox Security & Safe Code Execution

Problem:

Executing arbitrary user code on API servers is a major threat:

  • Security risk (hack the server, access DB)
  • Infinite loops (monopolizes CPU)
  • Resource exhaustion (OOM attacks)
  • Crypto mining / System compromise

Solution Comparison:

OptionProsCons
API Server❌ NoneResource exhaustion, high security risk
Virtual Machines (VMs)βœ… Strongest isolationSlow startup, expensive, heavy resource footprint
Docker Containers (Recommended)βœ… Lightweight, fast startup, resource-efficientSlightly less isolated than VMs (but safe with proper configuration)

Sandbox Security Measures:

  • Execution Timeout: Kill the process after a limit (e.g., 5 seconds) to prevent infinite loops:
    while True:
        pass
  • CPU & Memory Limits: Set container limits (e.g., --cpus="0.5" --memory="256m") to prevent OOM attacks.
  • Read-Only Filesystem: Mount directories as read-only to prevent malicious actions:
    rm -rf /
  • Network Isolation: Run container with --network none to prevent outgoing network requests (accessing database or calling external APIs).
  • Seccomp (Secure Computing Mode): Restrict system calls, disabling dangerous kernel calls.

Strong Interview Answer

To execute untrusted user code safely, I will run executions inside disposable Docker containers with strict resource limits. I’ll configure a timeout (e.g. 5s) to kill infinite loops, set memory limits to prevent OOM crashes, mount the root filesystem as read-only, disable container networking on SG firewall entirely to prevent DB/API leaks, and apply seccomp profiles to block dangerous system calls.


πŸ”₯ Deep Dive 2: Scaling Submissions & Async Execution Pattern

Problem:

During a contest, 100K users may submit code at the exact same moment. Running code is slow (requires spawning containers, executing tests), which would block API threads and crash the system.

Solution: Introduce a Message Queue (e.g., RabbitMQ, Kafka)

Client -> API Server -> [Queue] -> Workers -> Docker Sandbox

Flow:

  1. User clicks Submit.
  2. API Server inserts submission record into DB with status PENDING, pushes job to Queue, and returns submissionId immediately.
  3. Client receives 202 Accepted response.
  4. Client polls GET /submissions/{submissionId} every few seconds to check status.
  5. Background worker consumes job, spins up a Docker container, runs tests, updates DB status to COMPLETED/ACCEPTED/etc.
  6. Client poll sees the finished state and displays results.

βœ” Buffers traffic spikes
βœ” Prevents API server resource exhaustion
βœ” Guarantees message durability (submissions never lost)


πŸ”₯ Deep Dive 3: Real-Time Contest Leaderboards

Problem:

A naive leaderboard queries the submissions database using expensive aggregations:

SELECT userId, COUNT(*) as solved, MIN(completion_time) 
FROM submissions 
WHERE contest_id = 1 
GROUP BY userId;

❌ Too expensive to run at scale with 100K active users.

  • solution 1: just add cron - side aside
  • solution 2: add cron which update redis
  • solution 3: add cdc -

Solution: Redis Sorted Sets (ZSET)

Use Redis ZSET to store rankings in memory.

  • Key: leaderboard:{contestId}
  • Value/Member: userId
  • Score: A composite score representing solved problems and completion time.
    • Example Score Construction: ProblemsSolved * 10^10 - CompletionTimeInSeconds (so more solved = higher score, earlier time = higher score).
Why ZSET?

βœ” Fast ranking lookups & updates in (O(log N))
βœ” Built-in pagination support (ZREVRANGE)
βœ” Highly scale-friendly in-memory storage

Client Update Strategy:

  • Client polls the leaderboard API every 5 seconds.
  • Avoids the complexity and overhead of managing active WebSockets/SSE connections for 100K concurrent users when eventual consistency is acceptable.

πŸ”₯ Deep Dive 4: Test Case Execution

Design:

Store test cases in a language-agnostic format in the database (e.g., JSON).

{
  "type": "tree",
  "input": [3, 9, 20, null, null, 15, 7],
  "expected_output": 3
}

Execution Flow inside Sandbox:

  1. The worker writes the user’s code and the JSON test cases into the container.
  2. A language-specific test harness inside the container parses the JSON.
  3. It converts the JSON input into the native data structure (e.g. constructing a binary tree from [3, 9, 20, null, null, 15, 7]).
  4. It calls the user’s function with this object.
  5. It captures the return value, serialization, and compares it with expected_output.

βœ… Good Math: Sizing the Worker Pool

Let’s calculate how many workers we need during a contest:

  • Given: 100,000 active contest users
  • Submission Frequency: Average user submits code once every 2 minutes (120 seconds)
  • Peak Submissions Rate:
RPS = Total Submissions / Time Window
RPS = 100,000 / 120
RPS β‰ˆ 833 submissions/sec
  • Execution Time: Average code execution (container boot + run test cases) = 1 second
  • Required Concurrent Workers:
    • If we want latency under 1s, we need to process 833 runs concurrently.
    • This requires 833 concurrent worker threads/sandboxes.
    • Since this is resource-intensive, a Message Queue is absolutely required to prevent system lockups during peak bursts. It allows workers to run at max capacity (e.g., 200 concurrent runs) and queue the remaining requests safely without crashing.

Scaling Optimizations Summary

Sandbox Isolation

  • Use lightweight Docker containers with Timeout, CPU/Memory limits, Read-Only FS, Network isolation, and Seccomp.

Async Processing

  • Decouple submission API from code execution using a Message Queue and background workers.

Scalable Rankings

  • Use Redis Sorted Sets to maintain contest leaderboard in (O(\log N)) time instead of costly DB aggregations.

Simple Client Sync

  • Client pulls leaderboard updates via polling every 5 seconds, avoiding complex connection state management.

Language Agnostic Test Cases

  • Store test cases as JSON, parse into runtime structures using a test harness inside the sandbox.

Interview Nuggets

Why Polling over SSE/WebSockets for Leaderboard?

Contest leaderboards do not need sub-second real-time consistency. WebSockets require keeping 100K open TCP connections, which introduces massive server overhead. Polling is stateless and extremely scalable.

Why split Problem DB from Submission DB?

Problem DB has low writes (only admins add problems) and high reads. Submission DB has high write throughput. Splitting them prevents slow submission writes from locking problem queries.