1. Functional Requirements vs Non-Functional Requirements (NFRs)

  • Functional Requirements: what the system does == define the features of the system
  • Non-Functional Requirements (NFRs): how the system should behaves

🔹 Functional Requirements

  • These define the features of the system
  • They describe what the user can do
  • They describe what the system must support

📌 Examples

  • User can upload a photo
  • User can like a post
  • User can search for products
  • User can reset their password
  • Driver can accept a ride request

🔹 Non-Functional Requirements (NFRs)

  • These define the quality attributes of the system
  • They describe how well the system should perform
  • They are very important in system design because they influence architecture choices

✅ Simple Meaning

  • They answer:
  • How fast should the system respond?
  • How many requests should it handle?
  • How available should it be?
  • How secure should it be?
  • How easily should it scale and be maintained?

⚖️ Quick Comparison

TypeMeaningExample
Functional RequirementWhat system doesUser can upload photo
Non-Functional RequirementHow system behavesPhoto upload should finish within 2 seconds

Important Non-Functional Requirements

1️⃣ Latency ⏱️

Time taken to process one request

“How fast is a single request?”

Example:

  • API response time = 120 ms
  • DB query time = 20 ms

Key point:

  • Low latency = fast response

Typical question:

  • What is the acceptable response time?

Services:

  • CDN - Caches content closer to users → reduces latency
  • Redis (ElastiCache) - Cache frequently accessed data

2️⃣ Throughput

Throughput is the amount of work a system can process in a given time.

Simple idea:

“How much data or requests can the system handle per second”

Example:

  • Kafka throughput → messages/sec
  • API throughput → requests/sec
  • Disk throughput → MB/sec

Services:

  • Kafka (Kinesis) - High-throughput event streaming
  • RabbitMQ (SQS) - Queue for async processing
  • DynamoDB (Cassandra) - Handles massive read/write throughput

3️⃣ Availability 🟢

  • Availability means the system is up and accessible when users need it
  • Usually written as uptime percentage

Examples:

  • 99.9% uptime
  • Amazon S3 has 99.99% availability (4 nines)

Common numbers:

  • 99% = more downtime allowed
  • 99.9% = around 8.75 hours/year downtime
  • 99.99% = around 52 minutes/year downtime

Services:

  • S3 - Multi-AZ availability
  • RDS (Multi-AZ) - Automatic failover if DB crashes
  • Route 53 - Health checks + routing to correct available region (Multi-Region)

4️⃣ Scalability 📈

  • Ability of system to handle increasing load
  • load can be in users, traffic, data, or requests

Types:

  • Vertical scaling: increase CPU/RAM of one machine
  • Horizontal scaling: add more machines

Example:

  • Adding more Kafka brokers
  • Increasing pod replicas in Kubernetes

Services:

  • DynamoDB - On-demand scaling
  • Kubernetes (EKS) - Scale pods horizontally
  • Lambda - Serverless → auto scales instantly
  • Auto Scaling Groups (EC2) - Automatically add/remove servers

5️⃣ Reliability 🔒

  • Reliability means the system works correctly over time
  • It should not lose important operations or return wrong results

Examples:

  • Messages should not be lost
  • No data loss in Kafka
  • Messages delivered correctly

Related concepts:

  • Fault tolerance
  • Data consistency
  • Retry handling
  • Idempotency
  • durability

Why it matters:

  • A system being available is not enough → It must also behave correctly

Services:

  • SQS - Retry + DLQ (Dead Letter Queue) → Guarantees message durability (no loss) → Failed messages go to DLQ for recovery
  • Kafka (MSK - Managed Streaming for Kafka) - Replication across brokers
  • S3 - Extremely high durability (11 nines)

6️⃣ Security 🔐

  • Security protects the system and data from unauthorized access and misuse

Examples:

  • Authentication and authorization
  • Encryption in transit and at rest
  • Rate limiting
  • Input validation
  • Secrets management

Services:

  • IAM - Keycloak - Access control (who can do what)
  • Cognito - Authentication (login/signup)
  • KMS - Encryption keys
  • WAF - Protect from attacks (SQL injection, etc.)
  • Secrets Manager- HashiCorp Vault - Store passwords/API keys securely

7️⃣ Resiliency 💪

“What happens when the system fails?”

Key Ideas

  • Recover quickly
  • Continue operating
  • Handle partial failures
  • Graceful degradation

💡 Example

  • One server crashes → traffic shifts to another
  • DB fails → system switches to replica
  • Service timeout → retry happens

Resiliency Techniques

  • Retries with backoff
  • Circuit breakers
  • Load balancing
  • Failover (Multi-AZ)
  • Replication

📱 Example: Photo Sharing App

Functional Requirements

  • User can upload photo
  • User can view photo feed
  • User can like and comment on photos
  • User can follow other users

Non-Functional Requirements

  • Feed should load in under 300 ms
  • System should support millions of users
  • Photos should be stored reliably without loss
  • Service should be 99.9% available
  • User data should be secure

🧭 Good System Design Habit

Whenever you get a system design question:

  1. List the functional requirements
  2. List the non-functional requirements
  3. Prioritize the NFRs

You usually cannot optimize everything at once. So first decide what matters the most.

Examples:

  • If low latency is most important, use caching and CDN aggressively
  • If reliability is most important, use replication and durable queues
  • If scalability is most important, design for partitioning and horizontal scaling

💡 Interview Tip

A very strong start in a system design interview is:

“Let me first clarify the functional requirements and then define the non-functional requirements such as latency, scale, availability, and reliability.”