ALB have listeners with specific protocols and each listener can route the traffic to different target groups using listener rules
health check is done at the target group level using HTTP and HTTPS protocols
cross zone load balancing is enabled by default
Cannot attach elastic IP to ALB
ALB must be in a public subnet to work
Also supports gRPC protocol
supports Weighted Target Groups routing
NLB
Works at transport layer (layer 4)
extreme performance (can handle millions of requests per second)
TCP and UDP protocols
has one static IP per AZ which can also be elastic IP
NLB target groups can be:
EC2 instances
Private IP addresses
ALBs
health check can be done via TCP, HTTP, HTTPs protocols
cross zone load balancing is disabled by default
GWLB
Works at network layer (layer 3)
Route traffics to 3rd party virtual appliances to do processes like security analysis first before routing to the servers
Uses geneve protocol on port 6081
GWLB target groups can be:
EC2 instances
Private IP addresses
Cross zone load balancing is disabled by default
Cross-zone Load Balancing
Distribute the traffic evenly across target groups in different regions
ELBs have security groups too
ELBs are region bound
Sticky Sessions
to make sure the same client will always be routed to the same instance
support for CLB, ALB and NLB
ALB uses cookies which have expiration date that can be controlled
Cookies
Application based cookies
custom cookies: defined by application and name cannot be AWSALB, AWSALBAPP or AWSALBTG
application cookies: defined by load balancer and name is AWSALBAPP
Load balancer generated cookies / Duration based cookies:
generated by load balancer
name is AWSALB
SSL/TLS
Server name indication (SNI) is the extension of TLS protocol that enables client to specify the domain name it wants to reach through a single server endpoint
Connection Draining / Deregistration Delay
time to allow instances to finish on the fly requests before deregistering
new requests are not sent to the draining instance but instead routed to other healthy instances
can set between 0-3600 seconds (default is 300)
can be disabled by setting it to 0
ASG
ASG uses launch templates to manage ec2 instances
it scales using scaling policy
ASG can use cloudwatch alarms as triggers to scale the instances
EC2 instances can be put into standby state to temporarily remove them from ASG
Scaling Policies
Dynamic scaling
Target tracking policy
Simple/step scaling
Scheduled scaling
Predictive scaling
Launch template
Only a launch template can be used to provision capacity across multiple instance types using both On-Demand Instances and Spot Instances to achieve the desired scale, performance, and cost
Termination Policy in order
Based on instance allocation strategy
Oldest Launch Configuration
Oldest Launch Template
Next Billing Hour
Instance states
Pending
InService
Terminating
Terminated
Standby
Lifecycle Hooks
autoscaling:EC2_INSTANCE_LAUNCHING
autoscaling:EC2_INSTANCE_TERMINATING
autoscaling:EC2_INSTANCE_LAUNCHING
When Amazon EC2 Auto Scaling responds to a scale-out event, it launches one or more instances
These instances start in the Pending state
If you added an autoscaling:EC2_INSTANCE_LAUNCHING lifecycle hook to your Auto Scaling group, the instances move from the Pending state to the Pending:Wait state
After you complete the lifecycle action, the instances enter the Pending:Proceed state
When the instances are fully configured, they are attached to the Auto Scaling group and they enter the InService state
autoscaling:EC2_INSTANCE_TERMINATING
When Amazon EC2 Auto Scaling responds to a scale-in event, it terminates one or more instances
These instances are detached from the Auto Scaling group and enter the Terminating state
If you added an autoscaling:EC2_INSTANCE_TERMINATING lifecycle hook to your Auto Scaling group, the instances move from the Terminating state to the Terminating:Wait state
After you complete the lifecycle action, the instances enter the Terminating:Proceed state
When the instances are fully terminated, they enter the Terminated state
Cooldown period
ensures that the Auto Scaling group does not launch or terminate additional EC2 instances before the previous scaling activity takes effect
default is 300secs (5mins)
Databases
DynamoDB
Serverless
Fully managed, highly available NoSQL database with replication across multiple AZs
Millions of requests per seconds, trillions of row, 100s of TB of storage
RDS
RDS storage scales automatically within set maximum storage threshold
Automatically scales the storage if:
free storage is less than 10% of allocated storage
low storage lasts at least 5 mins
6 hrs have passed since last modification
Read Replicas
up to 15 replicas
support within AZ, cross AZ or cross region
replication is ASYNC and can have some replication delay
each replica can be promoted to their own db
each replica has different endpoint so application have to manage the endpoint calling
for RDS, read replicas dont charge data transfer fees if within same region
Read replicas can also be used as disaster recovery although replication is ASYNC
Multi-AZ
RDS db can be replicated multi AZ for disaster recovery
same DNS endpoint for all multi-AZ replicas
automatic failover standby
can’t be used as read scaling cause multi-AZ replicas are for standby
replication is SYNC
RDS Custom
Managed Oracle and Microsoft SQL Server Database with OS and database customization
RDS: entire database and the OS to be managed by AWS
RDS Custom: full admin access to the underlying OS and the database
Can SSH into underlying EC2 instance
Backup
Auto backup
daily full backup
transaction logs are backup every 5 mins
restore to any point in time oldest to last 5 mins
can set 1 to 35 days of retention, 0 to disable backup
Manual backup
take db snapshot
retention as long as user want
Can create backup and snapshots in multi-AZ
Stopped RDS db also charge cost
Encrypting un-encrypted RDS database
Take a snapshot of the database
Copy it as an encrypted snapshot
Restore a database from the encrypted snapshot
Terminate the previous database
Enhanced Monitoring
Monitor the operating system of your DB instance in real time
When you want to see how different processes or threads use the CPU, Enhanced Monitoring metrics are useful
IAM DB Authentication
works with MySQL and PostgreSQL
An authentication token is a string of characters that you use instead of a password
it’s valid for 15 minutes before it expires
Ways to use SSL encryption
Force SSL
Encrypt from client side
Force SSL
Set the rds.force_ssl parameter to true to force connections to use SSL
The rds.force_ssl parameter is static, so after you change the value, you must reboot your DB instance for the change to take effect
Encrypt from client side
This sets up an SSL connection from a specific client computer, and you must do work on the client to encrypt connections
Must obtain certificates for the client computer, import certificates on the client computer, and then encrypt the connections from the client computer
RDS Proxy for RDS and Aurora
Serverless, autoscaling, highly available (multi-AZ)
RDS Proxy is never publicly accessible (must be accessed from VPC)
Aurora
proprietary of AWS
Aurora storage automatically grows in increments of 10GB, up to 128 TB
up to 15 replicas
sub 10ms replica lag
Aurora costs around 20% more than RDS
shared storage volume with up to 6 copies of the data across 3 AZs
can create custom endpoint from subset of read replicas
good for analytics or dev testing env
Aurora Serverless
Automated database instantiation and auto- scaling based on actual usage
pay per second
Cannot change from provisioned to serverless
Global Aurora
1 primary read-write region
up to 5 secondary read-only regions
less than 1 second replication lag
up to 16 read replicas per each secondary region
Promoting another region (for disaster recovery) has an RTO of < 1 minute
DB Cloning
faster than snapshot-and-restore
initially, cloned DB access data from the same storage volume as original DB
when new data or updated data come, use new storage volume
useful for staging db creation from the original prod db
Backup (Aurora)
Auto backup
1 to 35 days (can’t be disabled)
Manual backup
take db snapshot
retention as long as user want
Read replicas failover priority
Watch the tier (smaller number, higher priority)
Watch the size (larger, the higher priority)
Aurora MySQL Native Function
Can create a native function or a stored procedure that invokes a Lambda function whenever a row in a table is modified in the database
Failover Scenerios
Single Instance
Aurora will attempt to create a new DB Instance in the same Availability Zone as the original instance
This replacement of the original instance is done on a best-effort basis and may not succeed, for example, if there is an issue that is broadly affecting the Availability Zone
Read Replica
Amazon Aurora flips the canonical name record (CNAME) for your DB Instance to point at the healthy replica, which in turn is promoted to become the new primary
Start-to-finish failover typically completes within 30 seconds
Aurora Serverless
Aurora will automatically recreate the DB instance in a different AZ
IAM DB Authentication
works with MySQL and PostgreSQL
An authentication token is a string of characters that you use instead of a password
it’s valid for 15 minutes before it expires
ElasticCache
to get managed Redis or Memcached
Redis: used for gaming leaderboards, application cache, geospatial data
Memcached: used for use cases like DB cache or user session store
Redis’s sorted set can be used for leaderboard ranking use cases
HIPAA-compatible
Have multi-AZ configuration
Can have up to 5 read replicas across multiple AZs
Neptune
Graph DB
DocumentDB
AWS service for MongoDB
KeySpaces
AWS service for Apache Cassandra
DNS
Route53
A highly available, scalable, fully managed and Authoritative DNS
The only AWS service which provides 100% availability SLA
Record Types
A - map to ipv4
AAAA - map to ipv6
CNAME - map to another domain name (can’t be root or top node namespace or zone apex)
Alias - can map root or top nodes to AWS resources (eg; alb endpoints) (extension of A or AAAA type)
NS - name servers for the hosted zones (for dns traffic routing)
Name Servers
Physical servers that resolve the DNS requests by looking at the records stored in hosted zones
NS record in a hosted zone route the DNS request traffic to name servers
Cost
$0.50 per month per hosted zone
Hosted Zones
Public
Private (within VPC)
Routing Policies
Simple
Weighted
Latency-based
Failover
Geolocation
Geoproximity
IP-based routing
Multi-value
Failover
active-active
active-passive
active-active
Both systems are running and can be served as failover
active-passive
Only one system is serving and another one is standby as failover occurs
s3 static website routing
To route s3 static website using Route53, name of the s3 bucket must be the same as domain name
Containerization
ECS
Launch Types
EC2
Fargate
EC2 Launch Type
Must provision & maintain the infrastructure (the EC2 instances)
Each EC2 Instance must run the ECS Agent to register in the ECS Cluster
Fargate Launch Type
No need to provision the infrastructure (no EC2 instances to manage)
IAM Roles
EC2 Instance Profile
ECS Task Role
Data Volumes
EBS volumes of each EC2 instance
Can use EFS
Fargate+EFS = Serverless
AWS Application Auto Scaling
Automatically increase/decrease the desired number of ECS tasks
Scaling Methods
Target Tracking
Step Scaling
Scheduled Scaling
Cluster Capacity Auto Scaling
Use ECS Cluster Capacity Provider to automatically provision and scale the infrastructure for your ECSTasks
Capacity Provider paired with an Auto Scaling Group
ECR
Store and manage Docker images on AWS
Fully integrated with ECS, backed by Amazon S3
EKS
EKS supports EC2 if you want to deploy worker nodes or Fargate to deploy serverless containers
ECS Anywhere and EKS Anywhere
Extends AWS ECS and EKS functionality to run containers on any infrastructure, including on-premises servers, edge devices, or virtual machines outside AWS
Allows organizations to use ECS and EKS as the orchestration layer for hybrid or multi-cloud deployments
AWS App Runner
Fully managed service designed to automatically deploy and scale web applications and APIs from source code or a container image, with minimal configuration
No infrastructure experience required, just need source code or container image
Disk capacity in the “function container” (in /tmp): 512 MB to 10GB
Concurrency executions: 1000 (can be increased) per region
Deployment
Lambda function deployment size (compressed .zip): 50 MB
Size of uncompressed deployment (code + dependencies): 250 MB
Can use the /tmp directory to load other files at startup
Size of environment variables: 4 KB
Lambda SnapStart for JAVA
Lambda initializes the function at publish time
Takes a snapshot of memory and disk state of the initialized function
Snapshot is cached for low-latency access
Running Container Images
Container image must be built using AWS provided base image tailored specifically for AWS Lambda
API Gateway
Endpoint Types
Edge-optimized
Regional
Private
Edge-optimized
Requests are routed through the CloudFront Edge locations (improves latency)
The API Gateway still lives in only one region
Regional
For clients within the same region
Could manually combine with CloudFront (more control over the caching strategies and the distribution)
Private
Can only be accessed from own VPC using an interface VPC endpoint (ENI)
Have to use a resource policy to define access
User Authentication
IAM Roles (useful for internal applications)
Cognito (identity for external users – example mobile users)
Custom Authorizer (your own logic)
Custom Domain Name HTTPS security through integration with AWS Certificate Manager (ACM)
Supports API Caching and Request Throttling too
Step Functions
Build serverless visual workflow to orchestrate your Lambda functions
AWS Cognito
Give users an identity to interact with the web or mobile application on AWS
Cognito User Pool
Sign in functionality for app users
Create a serverless database of user for the web & mobile apps
Integrate with API Gateway & Application Load Balancer
Cognito Identity Pool (Federated Identity)
Provide AWS credentials to users so they can access AWS resources directly
Integrate with Cognito User Pools as an identity provider
Get identities for “users” so they obtain temporary AWS credentials
Data Analytics
Amazon Athena
Serverless query service to analyze data stored in Amazon S3
Supports CSV, JSON, ORC, Avro, and Parquet
$5.00 per TB of data scanned
Commonly used with Amazon Quicksight for reporting/dashboards
Federated Query
To run SQL queries across data stored in relational, non-relational, object, and custom data sources (AWS or on-premises)
Uses Data Source Connectors that run on AWS Lambda to run Federated Queries
Store the results back in Amazon S3
Performance Improvement
Use columnar data (Apache Parquet or ORC) for cost-savings
Compress data for smaller retrievals
Partition datasets in S3 for easy querying on virtual columns
Use larger files (> 128 MB) to minimize overhead
RedShift
based on Postgresql but OLAP: online analytical processing (analytics and data warehousing)
10x better performance than other data warehouses, scale to PBs of data
Columnar storage of data (instead of row based) & parallel query engine
Modes
Provisioned Cluster
Serverless Cluster
Provisioned Cluster
Choose instance types in advance
Can reserve instances for cost savings
Redshift Clusters
Leader Node
Compute Node
Leader Node
for query planning, results aggregation
Compute Node
for performing the queries, send results to leader
Snapshots and DR
Snapshots are point-in-time backups of a cluster, stored internally in S3
can restore a snapshot into a new cluster
Automatically every 8 hours, every 5 GB or can be scheduled
Set retention between 1 to 35 days
Can manually take snapshots too
Can enable cross-region snapshots
Data Loading into RedShift
with Kinesis Data Firehose
s3 using copy command
without enhanced VPC routing
with enhanced VPC routing
EC2 Instance JDBC driver
RedShift Spectrum
to run query on data stored in s3 without loading the data
Amazon OpenSearch
Successor to ElasticSearch
common to use OpenSearch as a complement to another database as a database search API
Ingestion from Kinesis Data Firehose, AWS IoT, and CloudWatch Logs
Comes with OpenSearch Dashboards for visualization
Modes
Managed Cluster
Serverless Cluster
Amazon EMR
Amazon Elastic MapReduce
The clusters can be made of hundreds of EC2 instances with autoscaling and can be integrated with spot instances
EMR comes bundled with Apache Spark, HBase, Presto, Flink
EMR takes care of all the provisioning and configuration
Node Types
Master Node
Core Node
Task Node
Master Node
Manage the cluster, coordinate, manage health – long running
Core Node
Run tasks and store data – long running
Task Node
Just to run tasks – usually Spot
Purchasing Options
On demand
Reserved (min 1 yr)
Spot Instances
Modes
Long running cluster
Transient cluster
Amazon QuickSight
Serverless machine learning-powered BI service to create interactive dashboards
In-memory computation using SPICE engine if data is imported into QuickSight
Define Users and Groups (separate from IAM)
AWS Glue
managed ETL service
Glue Job Bookmarks
prevent re-processing old data
Glue Elastic Views
Combine and replicate data across multiple data stores using SQL
No custom code, Glue monitors for changes in the source data, serverless
Leverages a “virtual table” (materialized view)
Glue DataBrew
Prebuilt transformations
Glue Studio
GUI for ETL jobs
Glue Streaming ETL
for streaming data
built on Apache Spark Structured Streaming
compatible with Kinesis Data Streaming, Kafka, MSK
AWS LakeFormation
To build data lake
Created data lakes are stored in s3
Built on top of AWS Glue
Can be used to consolidate data from multiple accounts into a single account as a central datalake
MSK (Amazon Managed Streaming for Kafka)
Alternative to Amazon Kinesis
MSK Serverless
Run Apache Kafka on MSK without managing the capacity
MSK automatically provisions resources and scales compute & storage
AWS Data Exchange
service that makes it easy to find, subscribe to, and use third-party data in the AWS cloud
AWS Data Pipeline
enables you to automate the movement, transformation, and processing of data across different AWS services and on-premises data sources
useful for creating complex data workflows that involve scheduling, dependency management, and data transformations
Monitoring
CloudWatch
CloudWatch Metrics
CloudWatch provides metrics for every services in AWS
Metrics belong to namespaces (eg: S3, ECS, EC2,…)
Dimension is an attribute of a metric (eg: instance id, environment, etc…)
Up to 30 dimensions per metric
Can create CloudWatch Custom Metrics
Metric Streams
Continually stream CloudWatch metrics to a destination of your choice, with near-real-time delivery and low latency (to Kinesis Data Firehose, 3rd party service providers)
Option to filter metrics to only stream a subset of them
Cloudwatch Logs
organized into log groups and log streams
Can define log expiration policies (never expire, 1 day to 10 years…)
Elastic Beanstalk: collection of logs from application
ECS: collection from containers
AWS Lambda: collection from function logs
VPC Flow Logs: VPC specific logs - API Gateway
CloudTrail based on filter
Route53: Log DNS queries
Log Insights
Search and analyze log data stored in CloudWatch Logs
S3 Export
Log data can take up to 12 hours to become available for export
The API call is CreateExportTask
use Logs Subscriptions
Log Subscriptions
Get a real-time log events from CloudWatch Logs for processing and analysis
Send to Kinesis Data Streams, Kinesis Data Firehose, or Lambda
Subscription Filter: filter which log events are delivered to the destination
Can do cross-account subscription
CloudWatch Agents
To collect logs from EC2 instances or on-premise servers
Log Agents
Older version
Can only collect logs
Unified Agents
Can collect logs and also the instance metrics (eg: CPU, RAM, Disk info, etc)
CloudWatch Alarms
Alarms are used to trigger notifications for any metric
Alarm States
OK
Insufficient Data
In Alarm
Alarm Target Actions
EC2 instances (stop, terminate, reboot, etc)
EC2 Auto Scaling
Amazon SNS
Composite Alarm
Can trigger multiple alarms in conjunction
AND and OR conditions
EC2 Recovery
CloudWatch alarm can trigger the recovery of the Amazon EC2 instance, in case the instance fails.
The instance, however, should only be configured with an Amazon EBS volume
Recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata
CloudWatch Insights
CloudWatch Container Insights
CloudWatch Lambda Insights
CloudWatch Contributor Insights
CloudWatch Application Insights
CloudWatch Container Insights
ECS, EKS, Kubernetes on EC2, Fargate, needs agent for Kubernetes
CloudWatch Lambda Insights
Detailed metrics to troubleshoot serverless applications
CloudWatch Contributors Insights
Find “Top-N” Contributors through CloudWatch Logs
CloudWatch Application Insights
Automatic dashboard to troubleshoot your application and related AWS services
CloudTrail
Provides governance, compliance and audit for your AWS Account
Can be integrated with EventBridge to trigger AWS services based on CloudTrail events
Cloudtrail log files are encrypted by default
CloudTrail Events
Management Events
Data Events
CloudTrail Insights Events
Management Events
Operations that are performed on resources in your AWS account
By default, trails are configured to log management events.
Data Events
Granula data object activities like Amazon S3 object-level activity, AWS Lambda function execution activity
CloudTrail Insights Events
Analyze anomalies in write events to detect unusual patterns
Events retention
Events are stored for 90 days in CloudTrail
To keep events beyond this period, log them to S3 and use Athena
AWS Config
Helps with auditing and recording compliance of your AWS resources
Helps record configurations and changes over time
AWS Config is a per-region service
Can be aggregated across regions and accounts
Config Rules
Can use AWS managed config rules
Can make custom config rules
no free tier, 0.003perconfigurationitemrecordedperregion,0.001 per config rule evaluation per region
Config Resource
View compliance of a resource over time
View configuration of a resource over time
View CloudTrail API calls of a resource over time
Remediation
Automate remediation of non-compliant resources using SSM Automation Documents
Use AWS-Managed Automation Documents or create custom Automation Documents
Can set Remediation Retries if the resource is still non-compliant after auto-remediation
Notification
Use EventBridge to trigger notifications when AWS resources are non-compliant
Ability to send configuration changes and compliance state notifications to SNS (all events – use SNS Filtering or filter at client-side)
AWS Trusted Advisor
optimize costs, increase performance, improve security and resilience, and operate at scale in the cloud
recommends actions to remediate any deviations from best practices
can do service quota checks by writing an AWS Lambda function that refreshes the AWS Trusted Advisor Service Limits checks and set it to run every 24 hours
AWS X-ray
X-Ray collects data about the requests and responses, tracks latency, identifies performance bottlenecks, and detects errors, helping developers and operations teams understand how their applications behave in real-time
Service Map
X-Ray generates a service map that visualizes the relationships and interactions between the services in your application. This map highlights performance bottlenecks, latency issues, and error rates.
Disaster Recovery
RPO and RTO
Recovery Point Objective: Time between disaster and last backup point
Recovery Time Objective: Time between disaster and system recover time
DR Strategies
Backup and Restore
Pilot Light
Warm Standby
Hot Site / Multi Site Approach
Backup and Restore
Cheapest
High RPO, High RTO
Pilot Light
A most-minimal version of the app is always running in the cloud
Warm Standby
A scaled-down version of the full system is always up and running
Hot Site/ Multi Site
Full Production Scale is running both on AWS and On Premise
AWS Database Migration Service (DMS)
Can migrate databases both heterogeneously and homogeneously from different sources to targets (eg: from on-premise Oracle to AWS Aurora)
Must create an EC2 instance to perform the replication tasks
If the source and target db uses different db engines (eg: Oracle and Postgresql), Schema Conversion Tool (SCT) must be used
AWS DMS supports multi-AZ deployment
In addition to databases, s3 and kinesis can also be the source or target
full load and change data capture (CDC) replication task can be used to migrate and also track the on-going data changes
RDS and Aurora DB Migration
MySQL
PostgreSQL
MySQL
RDS to Aurora:
DB Snapshots from RDS MySQL restored as MySQL Aurora DB
Create an Aurora Read Replica from your RDS MySQL, and when the replication lag is 0, promote it as its own DB cluster
External to Aurora:
Backup onto s3 and import from s3 to Aurora
Use mysqldump utility to directly migrate into Aurora
Can also use DMS
PostgreSQL
RDS to Aurora:
DB Snapshots from RDS PostgreSQL restored as PostgreSQL Aurora DB
Create an Aurora Read Replica from your RDS PostgreSQL, and when the replication lag is 0, promote it as its own DB cluster
External to Aurora:
Create a backup, put it in Amazon S3 and import it using the aws_s3 Aurora extension
Can also use DMS
AWS Backup
Centrally manage and automate backups across AWS services
Amazon EFS / Amazon FSx (Lustre & Windows File Server)
AWS Storage Gateway (Volume Gateway)
Features
PITR for supported services
On-demand and scheduled backups
Tag based backup policies
Backup Plans
Backup Vault Lock
Backup Plans
Can configure:
Backup frequency
Backup window
Transition to cold storage
Retention period
Backup Vault Lock
WORM (Write Once Read Many)
Even the root user cannot delete backups inside the locked Vault
AWS ADS and MGN
Application Discovery Service (ADS)
Application Migration Service (MGN)
ADS
Plan migration projects by gathering information about on-premises data centers like server utilization data and dependency mapping
Resulting data can be viewed within AWS Migration Hub
Agentless Discovery
Uses AWS Agentless Discovery Connector
Discover VMinventory, configuration, and performance history such as CPU, memory, and disk usage
Agent-based Discovery
Uses AWS Application Discovery Agent
System configuration, system performance, running processes, and details of the network connections between systems
MGN
The “AWS evolution” of CloudEndure Migration, replacing AWS Server Migration Service (SMS)
Lift-and-shift (rehost) solution
Converts physical, virtual, and cloud-based servers to run natively on AWS
Migrate data by installing AWS Replication Agent on source servers
Compute
EC2
Storage
EBS
EFS
EC2 Instance Store
EBS
bound to specific AZs
by default, root volume is set to delete on termination
Only gp2/gp3 and io1/io2 can be used as boot volumes
EBS volumes support live configuration changes while in production which means that you can modify the volume type, volume size, and IOPS capacity without service interruptions
EBS Volume Types
gp2 (SSD)
gp3 (SSD)
io1 (SSD)
io2 block express (SSD)
st1 (HDD)
sc1 (HDD)
gp2
1 GiB - 16TiB
can burst IOPS to 3,000
Size of the volume and IOPS are linked
max IOPS is 16,000
if 3 IOPS per GB, max IOPS at 5,334 GB
gp3
1 GiB - 16TiB
Baseline of 3,000 IOPS and throughput of 125 MiB/s
Can increase IOPS up to 16,000 and throughput up to 1000 MiB/s independently
io1
4 GiB - 16TiB
Max IOPS: 64,000 for Nitro EC2 instances & 32,000 for other
Can increase IOPS independently from storage size
io2 Block Express
4 GiB - 64 TiB
Sub-millisecond latency
Max IOPS: 256,000 with an IOPS:GiB ratio of 1,000:1
Snapshots
snapshots can be copied across AZs
snapshots can be moved to snapshot archives which is 75% cheaper but can take 24 to 72 hrs to restore
snapshots can be moved to recycle bins and retention period can be set from 1 day to 1 year
fast snapshot restore: Force full initialization of snapshot to have no latency on the first use
snapshots can be created automatedly using Amazon Data Lifecycle Manager (DLM)
The EBS volume can be used while the snapshot is in progress
EBS Encryption
Copying an unencrypted snapshot allows encryption
Snapshots of encrypted volumes are encrypted
Encrypt an Unencrypted EBS Volume
Create an EBS snapshot of the volume
Encrypt the EBS snapshot ( using copy )
Create new EBS volume from the snapshot ( the volume will also be encrypted )
Copying encrypted snapshots across regions
Take snapshot of the encrypted volume
Copy the snapshot and encrypt using key B in region B
Restore the volume
Copying encrypted snapshots cross accounts
Create snapshot encrypted with own KMS key
Attach KMS key policy to authorize cross account decrypt access
Share encrypted snapshot
Encrypt the snapshot using KMS key B in account B
Restore the volume
EBS Multi Attach
only io1/io2 volume types can support multi attach
one volume can be attached to multiple instances within same AZ
up to 16 instances at the same time
EFS
network file system (NFS) that can be mounted on many EC2 instances
EFS can be attached to EC2 instances in multiple AZs
have to use security group to control access to EFS
can only be used with linux based AMIs
pay per use, no capacity planning
Performance Modes
General purpose
Max I/O
Throughput Modes
Bursting
Provisioned
Elastic
Bursting
scales with storage
burst up to 100MiB/s
Provisioned
set the throughput regardless of storage size
Elastic
automatically scales throughput up or down based on the workloads
Up to 3GiB/s for reads and 1GiB/s for writes
Storage Tiers
Standard
IA
Archive
Storage Life Cycle
Maximum day that can be configured using storage life cycle is 365 days
Availability Modes
standard (Multi-AZ)
one zone (Single-AZ)
EFS One Zone IA
IA storage tier with one zone availability mode
Instance Store
closely attached to EC2 instance
better I/O than EBS
destroyed when the instance is stopped
RAID 0 vs RAID 1
EBS and Instance Store supports RAID 0 configuration
RAID 0
Data are spread across multiple EBS or Instance store volumes and all volumes act as single storage
Increased throughput
RAID 1
Data are duplicated in all the EBS and Instance store volumes
For data redundancy
Instance Types
General Purpose (M, T)
Compute optimized (C)
Memory optimized (R)
Accelerated (G, P)
Storage optimized (I)
Compute Optimized (C)
Batch processing
HPC
Media transcoding
Scientific modeling
Dedicated gaming servers
Memory Optimized (R)
High performance databases
Cache stores
In memory BIs
In memory big data processing
Storage Optimized (I)
High performance OLTP
For high sequential I/O
Tenancy
default
dedicated
host
default
shared tenancy
dedicated
dedicated tenancy (eg: dedicated instances)
host
dedicated host
Security Group
Control ins/outs of the instance
VPC bound
Can attach to multiple instances
Only contains ‘Allow’ rules
Can reference by IP or by other SGs
Inbound traffics are blocked by default
Outbound traffics are allowed by default
Purchasing Options
On-demand Instances
Reserved Instances
Saving Plans
Spot Instances
Dedicated Hosts
Dedicated Instances
Capacity Reservation
On-demand Instances
Pay by second after 1 min
Reserved Instances
Reserved for 1 or 3 years
Payments: upfront, no upfront, partial upfront
Convertible reserved instance: can change instance attributes
Saving Plans
Reserved to a certain type of usage ($/hr)
Reserved for 1 or 3 years
Locked to an instance family and region
Usage beyond saving plans are charge at on-demand price
Spot Instances
Can get up to 90% discount
Can lose the instance when the current price gets larger than max price you pay
have 2 mins grace period at termination time
Cancelling a spot request does not terminate the instances
First cancel the request and then terminate the instances
Pay whether use the instances or not within reserved period
Capacity Reservations enable you to reserve compute capacity for your EC2 instances in a specific AZ for any duration (can also be in hourly duration)
Elastic IP
Can attach to one instance at a time
Can only have 5 IPs per account (can ask AWS to increase)
Placement Groups
Cluster
Spread
Partition
Cluster
Cluster instances into a low latency group within a single AZ
It is recommended that you launch the number of instances that you need in the placement group in a single launch request
use the same instance type for all instances in the placement group
If you try to add more instances to the placement group later, or if you try to launch more than one instance type in the placement group, you increase your chances of getting an insufficient capacity error
Need to re-launch the cluster when insufficient capacity error occurs
Spread
Spread instances across different hardwares across AZs
Only 7 instances per group per AZ
Partition
Many instances can share a partition (a rack of hardware) and partitions are distributed across AZs
Only 7 partitions per AZ
Elastic Network Interface (ENI)
One instance can have multiple ENIs attached with one primary private IPv4 and many secondary private IPv4s
ENIs are bound to specific AZs
Public IPv4 is assigned to an ENI according to ip assign rule of the subnet that the ENI belongs to
One elastic IP address per one private IP
EC2 Instance Stages
Stop
Terminate
Hibernate
Stop
Data on non-root EBS volume are preserved
All data on the attached instance-store devices will be lost
Underlying host can be changed when restarted
Elastic IP and ENIs are still attached
Terminate
If the EBS volume is set to be destroyed, all the data are lost
Hibernate
Data and states on RAM are saved on EBS and restart from the saved state
Instance ram size must be less than 150GB
Root volume must be EBS and encrypted
An instance cannot be hibernated for more than 60 days
It is not possible to enable or disable hibernation for an instance after it has been launched; Have to configure at launch time
AMI
AMIs can be accessed using:
AWS public AMIs
Custom made AMIs
AMIs found/sold on AWS marketplace
AMIs can be used to copy instances across AZs, Regions and Accounts
AMI includes one or more snapshots, so if AMI is copied, snapshots are copied along with it
Copying an AMI backed by an encrypted snapshot cannot result in an unencrypted target snapshot
EC2 Enhanced Networking
Elastic Network Adapter (ENA)
Elastic Fabric Adapter (EFA)
ENA
up to 100 Gbps
can support windows instances
EFA
Improved ENA for HPC
only works for Linux
Automation and Orchestration
AWS Batch
AWS ParallelCluster
AWS Batch
Managed service that helps you efficiently run batch processing jobs at scale
AWS Batch handles the provisioning, scaling, and management of compute resources required for batch jobs
AWS ParallelCluster
Open-source cluster management tool provided by AWS that simplifies the deployment, configuration, and management of high-performance computing (HPC) clusters on the AWS Cloud
There is vCPU-based On-Demand Instance limit per region
EC2 Billing
Pending: will not be billed
Running: will be billed
Stopping: will not be billed
Terminated: will not be billed
Stopping (to hibernate): will be billed
Terminated (reserved instance): will be billed
AWS Outposts
Fully managed service that extends AWS infrastructure, services, APIs, and tools to your on-premises data center or edge location
Brings AWS infrastructure (hardware and software) to your physical data center or on-premises environment
Supports core AWS services like Amazon EC2, ECS/EKS, RDS, S3, and EBS locally
AWS Wavelength
Brings AWS compute and storage services to the edge of telecommunications (telco) 5G networks, enabling developers to build applications that require ultra-low latency for end users and devices
AWS Wavelength extends AWS infrastructure into Wavelength Zones, which are zones within telco provider data centers connected to 5G networks
Applications deployed in these zones process data close to users, reducing the latency introduced by routing to traditional AWS regions
Access Control
IAM
IAM users can be grouped into IAM groups
Permission policies can be assigned to IAM groups
(or)
Can be assigned to users by mean of inline policy
Least privilege permission
One user can belong to multiple different groups, thus can have multiple permission policies
Groups can only contain users (cannot contain other groups)
Admin can set password policy for IAM users
AWS cloudshell is not available in every region
AWS services can do actions on behalf of user by being assigned IAM roles which include one or more IAM policies
Access is allowed only if explicit “Allow” permission is defined
MFA Options
Authenticator apps
Universal 2nd Factor (U2F)
MFA Options Security Key
Hardware key fob MFA device
Hardware key fob MFA device for AWS GovCloud
IAM security tools
Can generate IAM security credentials report of IAM users (account level)
IAM access adviser (user level)
AWS Organizations
Allows to manage multiple AWS accounts
The main account is the management account
Other accounts are member accounts
Member accounts can only be part of one organization
Organization Units (OUs)
Accounts in the organization are organized into OUs
OUs can be nested
Security Control Policy (SCP)
IAM policies applied to OU or Accounts to restrict Users and Roles
They do not apply to the management account (full admin power)
They do not affect the service-linked roles
Resource-based Policy vs IAM Roles
Some services provide resource-based policy but some only IAM role
Cross-account resource access can be done either by account A assuming role in account B or by defining resource-based policy for the resource in account B
Trust policy is also a type of resource-based policy
AWS Services with Resource-based Policy
Lambda
SNS
SQS
S3
API Gateway
KMS
AWS Services with IAM Roles
Kinesis streams
ECS tasks
…
IAM Permission Boundaries
Advanced feature to use a managed policy to set the maximum permissions an IAM entity can get
IAM Permission Boundaries are supported for users and roles only (not groups)
IAM Identity Center
One login (single sign-on) for all AWS accounts in AWS Organizations, business applications, and third-party applications (e.g., Salesforce, Office 365, etc.)
IAM users in Identity Center management account can be assigned with permission sets which allow them to access accounts and also specific resources in OUs
Can manage users and groups directly within AWS Identity Center or integrate with external identity providers like Microsoft Active Directory, Okta, or Azure AD
AWS ControlTower
Easy way to set up and govern a secure and compliant multi-account AWS environment based on best practices
AWS Control Tower uses AWS Organizations to create accounts
Preventive Guardrail
using SCPs (e.g., Restrict Regions across all your accounts)
Detective Guardrail
using AWS Config (e.g., identify untagged resources)
AWS Resource Access Manager (RAM)
To easily and securely share your resources with your AWS accounts
AWS ActiveDirectory (AD)
AWS Managed Microsoft AD
AD Connector
Simple AD
AWS Managed Microsoft AD
Create your own AD in AWS to manage users
Establish “trust” connections with your on-premises AD
AD Connector
Proxy for on-premise AD
Simple AD
AWS managed
Cannot be joined with on-prem ADs
AWS Federated Access
Federated Access in AWS refers to the ability to grant users from external identity providers (IdPs) access to AWS resources without having to create and manage AWS-specific IAM (Identity and Access Management) users for each individual
Types
Federation with IAM Identity Center
Federation with IAM
Federation with Amazon Cognito identity pools
Federation with IAM Identity Center
Users in IAM Identity Center are granted short-term credentials to your AWS resources
IAM Identity Center supports identity federation with SAML (Security Assertion Markup Language) 2.0 to provide federated single sign-on access for users who are authorized to use applications within the AWS access portal
Users can then single sign-on into services that support SAML, including the AWS Management Console and third-party applications, such as Microsoft 365, SAP Concur, and Salesforce
Federation with IAM Role
For single, standalone AWS account
User Logs In to IdP
IdP Sends Authentication Token to AWS
AWS Grants Temporary Credentials through STS
User Accesses AWS Services
CDN
Cloudfront
Cloudfront is a CDN service that caches the cloud contents at POPs (216 currently)
Cloudfront origin can be:
S3
EC2
ALB
any HTTP endpoint
Cloudfront can do geo restriction to allow or block users from specific countries using allowlist and blocklist
Should use in front of S3 if the file size is less than 1GB
Can use field level encryption to protect sensitive data for specific content
Can route to multiple origins based on the content type
Can use an origin group with primary and secondary origins to configure for high-availability and failover
Can generate Signed URL and Signed cookies
Global Accelerator
2 anycast IPs are created
anycast IPs send the traffic to the edge locations and edge locations send the traffic to the application endpoint
Uses internal AWS network
Can be used to distribute a portion of traffic to a particular deployment using enpoint weights
Good for gaming, IoT or voice over IP services
Cloudfront vs Global Accelerator
Cloudfront caches the contents at the edge location and serve the content from the edge location
global accelerator uses TCP or UDP to route the traffics through the edge location to the application
global accelerator doesn’t have cache service like cloudfront
both have DDoS protection using AWS shield
Storage
S3
max size of an object is 5TB
if an object is more than 5GB, have to use multi-part upload
blocking public access setting can be set at account level
Versioning
if versioning is enabled for a bucket, previous versions of the object are preserved when overwritten
if an object is deleted, it is not truly deleted but marked with the delete marker and then previous versions can be restored by deleting the delete marker
Once versioning is enabled for a bucket, it cannot be disabled, can only be suspended
Replication
replication is done by creating replication rule at the source s3 bucket
both source and destination bucket have to enable bucket versioning
only new objects are replicated
have to use s3 batch replicate to replicate existing and failed replication objects
can replicate buckets in different regions
Storage Classes
standard
standard IA
good for once a month access
one-zone IA
good for once a month access
glacier instant retrieval
millisec retrieval
good for data accessed once a quarter
min storage duration of 90 days
glacier flexible retrieval
expedited (1-5 mins), standard (3-5 hrs), bulk (5-12 hrs)
min storage duration of 90 days
glacier deep archive
standard (12 hrs), bulk (48 hrs)
min storage duration of 180 days
intelligent tiering
frequent access
infrequent access: objects not accessed for 30 days
archive instant access: objects not accessed for 90 days
archive access (optional): configurable from 90 to 700+ days
deep archive access (optional): configurable from 180 to 700+ days
ensures that your retrieval capacity for expedited retrievals is available when you need it
unit of capacity provides that at least three expedited retrievals can be performed every five minutes and provides up to 150 MB/s of retrieval throughput
Lifecycle Rules / Lifecycle Policies
Transition rule: to move objects from one class to another
Expiration rule: to delete expired objects
Object level rules
Requester Pay
requester of the object pays for the network costs
requester have to be an authenticated IAM user of an AWS account
After a bucket is configured to be a Requester Pays bucket, requesters must include x-amz-request-payer in their API request header, for DELETE, GET, HEAD, POST, and PUT requests, or as a parameter in a REST request to show that they understand that they will be charged for the request and the data download
Event Notifications
send messages/events to SNS, SQS (only standard queue) or Lambda function when an object action is triggered (eg: ObjectCreated:Put, ObjectCreated:Post, …)
receiving services have to be configured with IAM policy to receive event notification from s3
Performance
each s3 prefix can achieve 3500 put/copy/post/delete requests/sec and 5500 get/head requests/sec
if objects are distributed across 4 prefix, user can have 22000 get/head requests/sec and 14000 put/copy/post/delete requests/sec
how to further optimize s3 performance:
multi-part upload
s3 transfer acceleration
s3 byte range fetches
Batch Operations
to perform bulk operations on existing s3 objects with a single request
to get the list of objects:
use s3 inventory
filter using s3 select
and use s3 batch operation to do processings
Encryption
Server side encryption (SSE)
SSE-S3: encrypt with aws managed key
SSE-KMS: encrypt with KMS key
SSE-C: encrypt with customer provided key
Client side encryption (CSE)
CORS
Need to be enabled to access objects from web browsers
MFA Delete
Only root account can enable/disable MFA delete of a S3 bucket
Access Logs
To capture detailed records of requests made to the S3 bucket
Provide insights into who accessed the bucket, from where, and how they interacted with the objects
Presigned URLs
Time-limited URL that grants temporary access to an S3 object
Glacier Vault Lock
write once read many model
glacier vault lock has policy and that policy cannot be changed after set once
if an object is moved to glacier vault, it cannot be deleted anymore
S3 Object Lock
write once read many model
bucket versioning must be enabled
block an object version deletion for a period of time
Retention Modes
compliance - no one can delete the object or change the retention policy
governance - some(admin) users can delete the object or change the retention policy
Legal Hold
protect the object indefinitely
independent from retention period
legal hold can be placed and removed on an object by using s3:PutObjectLegalHold IAM permission
S3 Access Points
each AP points to each bucket
s3 access points can have own DNS names
can be internet origin or vpc origin
can have policy of it’s own
so the bucket policy can be simple
S3 Objects Lambda Access Points
Object lambda access points enable users to have modified s3 object by pointing to the lambda function which access the original s3 object and do modifications before sending to the object lambda access point
AWS Snow Family
snowcone and snowball edge are devices used for offline data migration
order the snowcone or snowball edge devices from AWS, load the devices with data, send back the devices to AWS and AWS will transfer the data from devices to s3 buckets
snowcone can handle 8TB hdd - 14TB ssd, migration size up to terabytes
snowball edge can handle 80TB - 210TB, migration size up to petabytes
snowball edge supports storage clustering
can do edge computing on snow devices by running lambda functions or ec2 instances at the edge
snowcone is capable with 2 cpu and 4gb of ram
snowball edge on the other hand is compute-optimized and storage-optimized
snowball cannot transfer the data directly to s3 glacier
snowmobile is used to move petabytes to exabytes of data, transfer data with container-sized trucks
AWS FSx
fully-managed high performance file systems on AWS
Types
FSx for Lustre
FSx for Windows file server
FSx for NetApp ONTAP
FSx for openZFS
AWS Storage Gateway
Bridge between on-premises data and cloud data
Types
s3 file gateway
FSx file gateway
Volume gateway (cached or stored)
Tape gateway
Volume Gateway Cached Mode
Only subset of data is stored in on-premise volume gateway
Volume Gateway Stored Mode
Full and redundant data is stored in on-premise volume gateway
AWS Transfer Family
A fully-managed service for file transfers into and out of Amazon S3 or Amazon EFS using the FTP protocol
Supported Protocols
AWS Transfer for FTP (File Transfer Protocol)
AWS Transfer for FTPS (File Transfer Protocol over SSL)
AWS Transfer for SFTP (Secure File Transfer Protocol)
AWS DataSync
Move large amount of data to and from (can be scheduled using agent tasks)
On-premise/Other clouds to AWS
AWS to AWS
Only AWS data transfer service that can directly transfer the data to S3 Glacier
Supported Storage Services
S3
S3 Glacier
EFS
FSx
Application Integration/Messaging
SQS
Producer/Consumer Model
Standard Queue
Unlimited throughput, unlimited number of messages in queue
Default retention of messages: 4 days, maximum of 14 days
Low latency (<10 ms on publish and receive)
Limitation of 256KB per message sent
Can have duplicate messages
Can have out of order messages
Default visibility timeout of 30 sec
Cannot set priority value to each message
FIFO Queue
Limited throughput: 300 msg/s without batching, 3000 msg/s with
Security Groups for EC2, ALB and ENI resources in VPC
AWS Network Firewall (VPC Level)
Route 53 Resolver DNS Firewall
Policies are created at the region level
AWS GuardDuty
Managed threat detection service
Analyze threat from input data like CloudTrail events, VPC flow logs, etc
Notify the findings through EventBridge
Foundational Data Sources
CloudTrail Events Logs
VPC Flow Logs
DNS Logs
Other Data Sources
S3 data event logs
EKS audit logs
Lambda network activity logs
RDS login activity logs
EBS volume data
AWS Inspector
Automated Security Assessments for:
EC2
Container Images push to Amazon ECR
Lambda Functions
Reporting & integration with AWS Security Hub
Send findings to Amazon Event Bridge
EC2
Leveraging the AWS System Manager (SSM) agent
Analyze against unintended network accessibility
Analyze the running OS against known vulnerabilities
Container Images push to Amazon ECR
Assessment of Container Images as they are pushed
Lambda Functions
Identifies software vulnerabilities in function code and package dependencies
Assessment of functions as they are deployed
AWS Macie
Find sensitive Personally Indentifiable Information (PII) in data stored on S3
AWS Artifact
To view, assess and manage the security reports as well as other AWS compliance-related information
AWS Security Hub
Security service that provides a comprehensive view of your security posture across AWS accounts
Security Hub collects and aggregates security findings from multiple AWS services such as Amazon GuardDuty, Amazon Macie, Amazon Inspector, and AWS Config, as well as from third-party security solutions
AWS Security Token Service (STS)
Service that you can use to create and provide trusted users with temporary security credentials that can control access to your AWS resources
Temporary security credentials work almost identically to the long-term access key credentials that your IAM users can use
VPC
Default VPC
Default VPC has Internet connectivity through internet gateway and all EC2 instances inside it have public IPv4 addresses
Own VPC
Can create max 5 per region (but soft limit)
Max CIDR per VPC is 5
CIDR size
Min: /28 (16 IP addresses)
Max: /16 (65536 IP addresses)
Allowed CIDR ranges (private)
10.0.0.0 – 10.255.255.255 (10.0.0.0/8)
172.16.0.0 – 172.31.255.255 (172.16.0.0/12)
192.168.0.0 – 192.168.255.255 (192.168.0.0/16)
Subnets
AWS reserves 5 IP addresses (first 4 & last 1) in each subnet
x.x.x.0 – Network Address
x.x.x.1 – reserved by AWS for the VPC router
x.x.x.2 – reserved by AWS for mapping to Amazon-provided DNS
x.x.x.3 – reserved by AWS for future use
x.x.x.255 – Network Broadcast Address. AWS does not support broadcast in a VPC, therefore the address is reserved
Each subnet maps to single AZ
Every subnet created is automatically associated with the main route table for the VPC.
IPv6-only Subnet
Can only support Nitro instances
Internet Gateway
Allows resources (e.g. EC2 instances) in a VPC connect to the Internet
It scales horizontally and is highly available and redundant
Must be created separately from a VPC and attach to a VPC
Subnet route tables must be configured to route the traffic to internet gateway to access the internet
Subnet becomes public subnet when it is connected to and routed through an internet gateway
Bastion Host
BH is an instance in a public subnet which have access to other instances in the private subnet
To be able to ssh into private instances via BH
SG of the BH have to allow port 22 from internet and SG of private instances must allow ssh from SG of the bastion host
NAT Instance
An instance in the public subnet through which the private instances can access to the internet
Must have Elastic IP attached to it
Must disable EC2 setting: Source / destination Check
An instance can be NAT instance by configuring using NAT AMIs
Route tables of private subnets must be configured to route traffic from private subnets to the NAT Instance
NAT Instance SG rules
Inbound:
Allow HTTP / HTTPS traffic coming from Private Subnets
Allow SSH from source network (access is provided through Internet Gateway)
Outbound:
Allow HTTP / HTTPS traffic to the Internet
NAT Gateway
AWS-managed NAT instance
Higher bandwidth, high availability, no administration
Pay per hour for usage and bandwidth
NAT GW is AZ-bound
Uses an Elastic IP
Can’t be used by EC2 instance in the same subnet (only from other subnets)
Private Subnet ⇒ NATGW ⇒ IGW
5 Gbps of bandwidth with automatic scaling up to 100 Gbps
SGs and NACLs
SGs
Operates at instance level
Stateful (always allow return traffic)
Only support ‘Allow’ rules
Evaluate all the rules before deciding to allow
Newly created SG will ‘Deny’ every inbound traffic and ‘Allow’ every outbound traffic
NACLs
Operates at subnet level
Stateless
Supports both ‘Allow’ and ‘Deny’ rules
One NACL per subnet, new subnets are assigned the Default NACL
NACLs and subnets are decoupled and NACLs live in VPC
Default NACL is “allow all”
Newly created NACLs will deny everything (inbound or outbound)
NACL have to be configured to allow inbound and outbound ephemeral ports since it is stateless
NACL Rules
Rules have a number (1-32766), higher precedence with a lower number
First rule match will drive the decision
The last rule is an asterisk (*) and denies a request in case of no rule match
VPC Peering
Privately connect two VPCs using AWS network
Peer VPCs must not have overlapping CIDRs
VPC Peering connection is NOT transitive
Route tables of subnets in both VPC have to be updated to route the traffic to other VPC through peer connection
Can create VPC Peering connection between VPCs in different AWS accounts/regions
Can reference a security group in a peered VPC (cross accounts but same region)
VPC End Points
VPC Endpoints (powered by AWS PrivateLink) allows to connect to AWS services using a private network instead of using the public Internet
Remove the need of IGW, NATGW, … to access AWS Services
Types
Interface Endpoint
Gateway Endpoint
Interface Endpoint
Provisions an ENI (private IP address) as an entry point (must attach a Security Group)
Supports most AWS services
perhour+ per GB of data processed
Can be used to connect to another VPC
Uses AWS PrivateLink to connect the endpoint to services
Gateway Endpoint
Provisions a gateway and must be used as a target in a route table (does not use security groups)
Free
Supports S3 and DynamoDB
If S3 or DynamoDB is not in the same region as the subnet, Gateway Endpoint cannot be used since Gateway Endpoint is a regional service (use NAT gateway or Interface Endpoint instead)
can attach an endpoint policy that controls access to the service to which you are connecting
does not use AWS PrivateLink
Flow Logs
Capture information about IP traffic going into your interfaces
Can query VPC flow logs using Athena on S3 or CloudWatch Logs Insights
Flow Logs data can go into:
S3
Cloudwatch logs
Kinesis Data Firehose
Site-to-site VPN Connection
To connect VPC with on-prem servers through private VPN connection over public network
Site-to-site VPN connection can be used as a backup connection to Dx connection
Need 2 things:
Virtual Private Gateway (VGW)
Customer Gateway (CGW)
VGW
VPN concentrator on the AWS side of the VPN connection
VGW is created and attached to the VPC from which you want to create the Site-to-Site VPN connection
Need to enable Route Propagation for the VGW in the route table that is associated with the subnets in the VPC
CGW
Software application or physical device on customer side of the VPN connection
Need public Internet-routable IP address for the Customer Gateway device
If CGW is private, need NAT device to enable public routing
VPN Cloudhub
Provide secure communication between multiple sites, if you have multiple VPN connections
To set it up, connect multiple VPN connections on the same VGW, setup dynamic routing and configure route tables
Direct Connect (Dx)
Provides a dedicated private connection from a remote network to your VPC
Dedicated connection must be setup between the data center and AWS Direct Connect locations
Need to setup a VGW at VPC side
Lead times are often longer than 1 month to establish a new connection
Connection Flows
Private VPC Connection
Public Resources Connection
Private Connection Flow
VGW ⇒ Dx Connector in Dx locations ⇒ Customer router in Dx locations ⇒ Customer router in customer network
Public Connection Flow
Public AWS resources (like s3) ⇒ Dx Connector in Dx locations ⇒ Customer router in Dx locations ⇒ Customer router in customer network
Direct Connect Gateway
If you want to setup a Direct Connect to one or more VPC in many different regions (same account), you must use a Direct Connect Gateway
Dx connection connects to Direct Connect Gateway and Direct Connect Gateway connects to multiple VGWs
Connection Types
Dedicated Connections
Hosted Connections
Dedicated Connections
1Gbps,10 Gbps and 100 Gbps capacity
Physical ethernet port dedicated to a customer
Request made to AWS first, then completed by “AWS Direct Connect Partners”
Hosted Connections
50Mbps, 500 Mbps, to 10 Gbps
Connection requests are made via “AWS Direct Connect Partners”
Capacity can be added or removed on demand
1, 2, 5, 10 Gbps available at select AWS Direct Connect Partners
Encryption
Data in transit is not encrypted but is private
AWS Direct Connect + VPN provides an IPsec-encrypted private connection
Resiliency
High resiliency
Max resiliency
High resiliency
One connection at multiple Dx locations
Max resiliency
Maximum resilience is achieved by separate connections terminating on separate devices in more than one location.
Transit Gateway
Transit Gateway sits in the middle to connect multiple VPCs transitively and can also connect to Dx Gateway and Site-to-site VPN connections
Regional resource
Share cross-account using Resource Access Manager (RAM)
You can peer Transit Gateways across regions
Route Tables: limit which VPC can talk with other VPC
Supports IP Multicast
Can peer multiple transit gateways in multiple regions
Site-to-site VPN ECMP (Equal Cost Multiple Paths)
Routing strategy to allow to forward a packet over multiple best path
Use case: create multiple Site- to-Site VPN connections to increase the bandwidth of your connection to AWS
VPC Traffic Mirroring
Capture and mirror the traffic to send the mirrored traffic into own security appliances to analyze, monitor or troubleshoot
Source and Target can be in the same VPC or different VPCs (VPC Peering)
Egress-only Internet Gateway
Used for IPv6 only
Similar to a NAT Gateway but for IPv6
Must update the Route Tables
Allows instances in your VPC outbound connections over IPv6 while preventing the internet to initiate an IPv6 connection to your instances
AWS Network Firewall
Protect entire VPC
From Layer 3 to Layer 7 protection
Internally uses AWS Gateway Load Balancer
Rules can be centrally managed cross- account by AWS Firewall Manager to apply to many VPCs
Can send logs of rule matches to Amazon S3, CloudWatch Logs, Kinesis Data Firehose
Protect directions
VPC to VPC traffic
Outbound to internet
Inbound from internet
To/from Direct Connect & Site-to-Site VPN
Fine-grained Controls
IP & port - example: 10,000s of IPs filtering
Protocol – example: block the SMB protocol for outbound communications
Stateful domain list rule groups: only allow outbound traffic to *.mycorp.com or third-party software repo
General pattern matching using regex
etc
Cost
Cost Explorer
Visualize, understand, and manage AWS costs and usage over time
Create custom reports that analyze cost and usage data
Monthly, hourly, resource level granularity
Forecast usage up to 12 months based on previous usage
Have API support with pagination
Cost Anomaly Detection
Continuously monitor cost and usage using ML to detect unusual spends
Monitor AWS services, member accounts, cost allocation tags, or cost categories
Sends the anomaly detection report with root-cause analysis
Get notified with individual alerts or daily/weekly summary (using SNS)