1. Functional Requirements vs Non-Functional Requirements (NFRs)
- Functional Requirements: what the system does == define the features of the system
- Non-Functional Requirements (NFRs): how the system should behaves
🔹 Functional Requirements
- These define the features of the system
- They describe what the user can do
- They describe what the system must support
📌 Examples
- User can upload a photo
- User can like a post
- User can search for products
- User can reset their password
- Driver can accept a ride request
🔹 Non-Functional Requirements (NFRs)
- These define the quality attributes of the system
- They describe how well the system should perform
- They are very important in system design because they influence architecture choices
✅ Simple Meaning
- They answer:
- How fast should the system respond?
- How many requests should it handle?
- How available should it be?
- How secure should it be?
- How easily should it scale and be maintained?
⚖️ Quick Comparison
| Type | Meaning | Example |
|---|---|---|
| Functional Requirement | What system does | User can upload photo |
| Non-Functional Requirement | How system behaves | Photo upload should finish within 2 seconds |
Important Non-Functional Requirements
1️⃣ Latency ⏱️
Time taken to process one request
“How fast is a single request?”
Example:
- API response time =
120 ms - DB query time =
20 ms
Key point:
- Low latency = fast response
Typical question:
- What is the acceptable response time?
Services:
- CDN - Caches content closer to users → reduces latency
- Redis (ElastiCache) - Cache frequently accessed data
2️⃣ Throughput
Throughput is the amount of work a system can process in a given time.
Simple idea:
“How much data or requests can the system handle per second”
Example:
- Kafka throughput → messages/sec
- API throughput → requests/sec
- Disk throughput → MB/sec
Services:
- Kafka (Kinesis) - High-throughput event streaming
- RabbitMQ (SQS) - Queue for async processing
- DynamoDB (Cassandra) - Handles massive read/write throughput
3️⃣ Availability 🟢
- Availability means the system is up and accessible when users need it
- Usually written as uptime percentage
Examples:
99.9% uptimeAmazon S3 has 99.99% availability (4 nines)
Common numbers:
- 99% = more downtime allowed
- 99.9% = around 8.75 hours/year downtime
- 99.99% = around 52 minutes/year downtime
Services:
- S3 - Multi-AZ availability
- RDS (Multi-AZ) - Automatic failover if DB crashes
- Route 53 - Health checks + routing to correct available region (Multi-Region)
4️⃣ Scalability 📈
- Ability of system to handle increasing load
- load can be in users, traffic, data, or requests
Types:
- Vertical scaling: increase CPU/RAM of one machine
- Horizontal scaling: add more machines
Example:
- Adding more Kafka brokers
- Increasing pod replicas in Kubernetes
Services:
- DynamoDB - On-demand scaling
- Kubernetes (EKS) - Scale pods horizontally
- Lambda - Serverless → auto scales instantly
- Auto Scaling Groups (EC2) - Automatically add/remove servers
5️⃣ Reliability 🔒
- Reliability means the system works correctly over time
- It should not lose important operations or return wrong results
Examples:
- Messages should not be lost
- No data loss in Kafka
- Messages delivered correctly
Related concepts:
- Fault tolerance
- Data consistency
- Retry handling
- Idempotency
- durability
Why it matters:
- A system being available is not enough → It must also behave correctly
Services:
- SQS - Retry + DLQ (Dead Letter Queue) → Guarantees message durability (no loss) → Failed messages go to DLQ for recovery
- Kafka (MSK - Managed Streaming for Kafka) - Replication across brokers
- S3 - Extremely high durability (11 nines)
6️⃣ Security 🔐
- Security protects the system and data from unauthorized access and misuse
Examples:
- Authentication and authorization
- Encryption in transit and at rest
- Rate limiting
- Input validation
- Secrets management
Services:
- IAM - Keycloak - Access control (who can do what)
- Cognito - Authentication (login/signup)
- KMS - Encryption keys
- WAF - Protect from attacks (SQL injection, etc.)
- Secrets Manager- HashiCorp Vault - Store passwords/API keys securely
7️⃣ Resiliency 💪
“What happens when the system fails?”
Key Ideas
- Recover quickly
- Continue operating
- Handle partial failures
- Graceful degradation
💡 Example
- One server crashes → traffic shifts to another
- DB fails → system switches to replica
- Service timeout → retry happens
Resiliency Techniques
- Retries with backoff
- Circuit breakers
- Load balancing
- Failover (Multi-AZ)
- Replication
📱 Example: Photo Sharing App
Functional Requirements
- User can upload photo
- User can view photo feed
- User can like and comment on photos
- User can follow other users
Non-Functional Requirements
- Feed should load in under
300 ms - System should support
millions of users - Photos should be stored reliably without loss
- Service should be
99.9%available - User data should be secure
🧭 Good System Design Habit
Whenever you get a system design question:
- List the functional requirements
- List the non-functional requirements
- Prioritize the NFRs
You usually cannot optimize everything at once. So first decide what matters the most.
Examples:
- If low latency is most important, use caching and CDN aggressively
- If reliability is most important, use replication and durable queues
- If scalability is most important, design for partitioning and horizontal scaling
💡 Interview Tip
A very strong start in a system design interview is:
“Let me first clarify the functional requirements and then define the non-functional requirements such as latency, scale, availability, and reliability.”