AWS Notes

Introduction

Storage

AWS S3 - Simple Storage Service

Glacier

S3 Vs Glacier Cost Shootout

Assuming:

Then:

Notes:

AWS Storage Gateway

File Gateway Volume Gateway (cached volume) aka gateway-cached volume Volume Gateway (stored volume) aka gateway-stored volume Tape Gateway aka gateway-virtual tape library
What Object store S3 files accessible as NFS/SMB Block store Block store Cloud tape backup to S3, Glacier, or Glacier Deep Archive
Why? Elastic network share, disaster recovery SAN, migrate to cloud (32 TiB in cloud with only 300 GiB local). Cheap storage. Low-latency SAN, migrate to cloud, DR
Local gateway access NFS, Samba (SMB) iSCSI iSCSI iSCSI
Local gateway has Cache and files yet to be written to S3 Most recently used files (full volume in S3) Full volume Data waiting to be written to AWS
Transfer to S3 Async Async volume or schedule Async volume or schedule Controlled by backup software. No dedicated local gateway storage so async???
Direct access to files in S3 Yes No, EBS snapshot to volume gateway or EBS volume No, EBS snapshot to volume gateway or EBS volume No, virtual table library not shown in S3. Use backup application to access data out of VTL (if in Glacier, need to request retrieval)
Consistency - can have multiple reader/writers (i.e. other gateways) but uncoordinated = unreliable.
- single writer highly recommended
- need polling or RefreshCacheAPI call or CloudWatch event for gateway to know if in S3 or lifecycle deleted it
Only via gateway or EBS snapshot Only via gateway or EBS snapshot
Disk allocation Cache Cache, Upload buffer Upload buffer, stored volume Cache, Upload buffer
Limits - 1 file share per bucket
- 10 file shares per gateway
- 1024 char path limit
- 5TB file size (same as S3)
32 x 32TiB volume (more than 16TiB can only restore back to gateway) = 1 PiB 32 x 16 TiB volume (EBS limit) = 0.5 PiB - 1500 tapes per VTL
- 100 GiB to 5 TiB tapes
- Total 1 PiB per VTL
Gotchas - no sym link
- rename instance on gateway but copy-put then delete in S3 (eventually consistent)
- Need enough space locally for entire stored volume
- Restore from AWS needs to copy down to stored volume so takes time
Readable via tape application only

AWS RDS relational database service

Offering: MSSQL, Oracle, MySQL, PostgreSQL (object RDBMS, i.e. inheritance), Aurora (Amazon’s flavour of MySQL specifically tuned for AWS), MariaDB (fork of MySQL after Oracle bought it)

Aurora MySQL MariaDB PostgreSQL Oracle SQL Server
Multi-AZ Already replicates 2 copies per AZ x 3 AZs Use Aurora replica for failover. Via AWS tech Via AWS tech Via AWS tech Via AWS tech Still need license in both. Via SQL Server Mirroring
Can multi-AZ after creation No Yes Yes Yes Yes No
Read replicas (max) Yes (15 Aurora replicas across AZs within region only)

Cross-region with MySQL

2018 has Aurora Global Database
Yes (5)

Can delay replication for disaster recovery

Can replicate to non-RDS MySQL/MariaDB using native MySQL features
Yes (5)

Use GTID replication to replicate non-RDS MariaDB into RDS.
Yes (5) Yes (5) but need a Data Guard license (BYOL only) No
Write to read replica (e.g. add index) ??? Logical replication so yes Logical replication so yes Physical replication so no Physical replication so no N/A
Read replica of a read replica Yes Yes Yes No No N/A
Read replica deletion of master Replicas remain active, one is promoted to master All replicas are promoted to independent master DBs Replica remains active All replicas are promoted to independent master DBs N/A N/A
Apply DDL to read replica (i.e. add index) No??? Yes Yes No N/A N/A
Create backup from read replica No??? Yes Yes Yes No N/A
Size limits (may require latest Ec2 instance type and gp2/io1) 64 TB table 64 TB max 64 TB max 64 TB max 64 TB 16 TB
Gotchas MyISAM not crash-consistent so snapshots might be corrupted (makes read-replica and multi-AZ difficult) - Enterprise only supported with BYOL
- Read replica needs BYOL
- No cross-region replica support
No read-replicas

AWS Aurora

Amazon’s drop-in-replacement for MySQL/PostgreSQL, only in AWS. 5x perf of MySQL, 1/10 cost, similar availability.

AWS DynamoDB

General No-SQL Partition Key Design

See Analytics And Storage Asides for more.

AWS ElastiCache

Sub-millisecond, in-memory cache of DB services in cloud, makes read and writes faster. Good for static DBs (faster than DynamoDB). Two different caching engines:

Memcached Redis
Object complexity Simple Complex (sorted-sets, lists, hashes, bit-arrays, HyperLogLog, geospatial)
Horizontal scale-out Yes but application has to decide which node to retrieve/store data.
Scaling while running not advised (will lose some cache even with consistent caching; ASG not supported anyway so manual via console or API)
Uses read replicas (async replication) to scale reads.

Online vertical scaling supported 2019

For write, cluster mode enabled with Redis cluster client can shard (multiple primary nodes), can online scale-in/out without downtime (not ASG, invoked manually)
Multi-AZ Yes but tradeoff could add latency for some retrievals Yes for read replicas (async replication) to achieve high availability/durability
Replication/Durability No
If node dies, will lose data, but remaining shards probably OK.
- Master-slave for HA. Up to 5 read replicas (async replication) which can be in same/other AZ
- Failover detection and promotion of read replica to master (flop DNS)
- Snapshot back/restore to disk. Useful to pre-warm cache, or scale-up
Feature - Memcached-compliant
- Multi-threaded (so scaling CPU up helps)
- Encryption
- HIPAA compliance
- Pub-sub
- Lua scripts
- Transactions
Think of as Big slab of memory in the cloud. Really just a pool of expendable nodes that grows/shrinks (like auto-scaling group, i.e. replacement of failures, discovery, although Memcached doesn’t support ASG) Relational database of stateful entities with failover (like RDS)
Good for - Simplicity
- Object-caching
- Horizontal scaling
- Large multi-threaded cache nodes
- Ranking/sorting (e.g. leaderboard)

Other Storage

Block Storage

AWS provides a disk, up to clients/operating system to decide what to do with it.

EBS - Elastic Block Store

gp2 (general purpose SSD) io1 (provisioned IOPS SSD) st1 (throughput optimized HDD) sc1 (cold storage HDD)
What General purpose SSD balanced for price and performance High performance (low latency, high throughput) SSD Low-cost HDD for throughput intensive workloads Lowest cost HDD for less frequently accessed workloads
Uses Most workloads, desktops, non-PROD environments Critical apps requiring sustained IOPS or >16,000 IOPS or 250 MiB/s, e.g. RDBMS, No-SQL Streaming at high throughput and low cost, big data, data warehouse, log processing High-throughput at lowest cost. Infrequently accessed.
Size 1 GiB - 16 TiB 4 GiB - 16 TiB 500 GiB - 16 TiB 500 GiB - 16 TiB
Dominant performance characteristic IOPS IOPS MiB/s MiB/s
Base IOPS (initial) 100 IOPS at <33.33 GiB 100 IOPS
Base IOPS (scaling) 3 IOPS / GiB Max ratio: 50 IOPS to GiB (e.g. 100 GiB volume, max 5000 IOPS) Recommend >2 IOPS:GiB
Base IOPS (max) 16000 IOPS @ 5335 GiB 64000 IOPS on Nitro (32000 IOPS non-Nitro) Min volume for max IOPS: - 1280 GiB (Nitro) - 640 GiB (non-Nitro) 500 IOPS (1 MiB I/O) 250 IOPS (1 MiB I/O)
I/O size

Can’t actually modify this??? EBS is a network attached.
16 KiB to 256 KiB 16 KiB to 256 KiB Max I/O: - 256 KiB I/O for 32000 IOPS (non-Nitro???) - 16 KiB I/O for Nitro 1 MiB common 1 MiB common
Base MiB/s (initial) 128 MiB/s at < 170 GiB No initial??? To achieve 128 MiB/s: - @16 KiB I/O: 8000 IOPS - @256 KiB I/O: 500 IOPS 20 MiB/s @0.5 TiB

Scaling exactly linear at 40 MiB/s per TiB (need 12.5 TiB for max)
6 MiB/s @ 0.5 TiB

Scaling exactly linear at 12 MiB/s per TiB (can’t achieve max, must burst)
Max MiB/s 128 MiB/s @ <170 GiB
250 MiB/s @ 170-334 GiB if credits available
(@3000 IOPS, need > 85.3 KiB I/O)
250 MiB/s @ >334 GiB

Volumes for 250 MiB/s:
@16 KiB I/O = 16000 IOPS (~5334 GiB volume)
@256 KiB I/O = 1000 IOPS (~334 GiB volume)
1000 MiB/s (Nitro)
@16 KiB I/O = 64000 IOPS
@256 KiB I/O = 4000 IOPS

500 MiB/s (non-Nitro)
@256 KiB I/O
500 MiB/s 250 MiB/s
Burst To 3000 IOPS

Initially 5.4M I/O credits (3000 IOPs for 30 minutes)

Accumulates 3 IOPS/GiB up to 5.4M I/O credits.
N/A IOPS are provisioned and MiB/s tied to IOPS Max credits = volume size

“Burst to” scales 250 MiB/s per TiB so:
- min 125 MiB/s (o.5 TiB)
- max 500 MiB/s (hits 500 MiB/s limit at 2 TiB)
Max credits = volume size

“Burst to” scales 80 MiB/s per TiB so:
- min 40 MiB/s (0.5 TiB)
- max 250 MiB/s (hits 250 MiB/s limit at 3.125 TiB)
Burst boundaries Baseline IOPS: >1000 GiB hits 3000 IOPS limit

Baseline MiB/s: see Max MiB/s section
Irrelevant > 12.5 TiB Burst always relevant since max size 16 TiB is 192 MiB/s
Gotchas/Limits Very expensive vs gp2 (for equivalent baseline IOPS, io1 is 2x price). Therefore io1 becomes useful when:
- need >16,000 IOPS or >250 MiB/s throughput
- need provisioned performance >99.9% of the time (vs gp2’s 99%)
Can’t be boot drive Can’t be boot drive

AWS EFS - Elastic File Storage

Block-Store Performance shootout

EBS (io1) EBS instance store EFS
Max Size 16 TiB Up to 60 TB (8 x 7500 GB NVMe SSD) Petabyte-scale
MB/s 1.7 GB/s for EBS-optimized EC2 instance ceiling (e.g. c5 family) Possibly 16 GB/s on i3en (this is either EBS network or all 8 NVMe SSDs driving) 10+ GB/s. Linearly burst 100 MB/s per TB, so 10 GB/s = 100 TB.
Latency Low Lowest Average

Storage Asides

Network

VPC Virtual Private Cloud

Internet Access Options

For EC2 instance to be internet accessible, being inside public subnet (subnet with route to internet, e.g. IG) not enough. Needs public IP or Elastic IP, routing table (to send 0.0.0.0/0 traffic to IG)

Connecting to AWS VPC For Organisations

Table comparing these options:

Connecting VPC To VPC

Similar to Organisation to AWS VPC section above (except whitepaper has no Direct Connect plus VPN, Transit VPC, and CloudHub):

Route 53

DNS (domain name system) which runs on port 53. Not region-specific (i.e. global)

AWS CloudFront

Content delivery service (CDS) in a content delivery network (CDN) to distribute web pages/video/APIs to users based on user location, origin of page, and content.

Aside

Compute

EC2 - Elastic Cloud Compute

Virtual server, SLA of 99.95% available

AWS ELB - Elastic Load Balancer

Application Network Classic
Target - Targets in target groups
- IPs within VPC or VPC-peer on-prem via Direct Connect (RFC 1918, RC 6598 100.64.0.0/10)
- Lambda (for AWS migration)
- Targets in target groups
- IPs within VPC or on-prem via Direct Connect (RFC 1918, RC 6598 100.64.0.0/10)
- Instances
- Not via IPs (so no on-prem)
- No concept of “target group”
Load balancer address - Public: DNS over public subnet
- Private: internal DNS
- Public: can have one Elastic IP per subnet. providing static IPs, otherwise IP in public subnet per AZ
- Private: DNS to private IPs
- Public/Internet facing needs DNS to public IP
- Private: internal DNS to private IP addresses
Traffic - Layer 7: HTTP, HTTPS, HTTP/2
- Not layer 4 (TCP, UDP)
- Layer 4: TCP, UDP, TLS - Layer 4/7: HTTP, HTTPS, SSL, TCP, SSL/TLS
- Can’t mix-and-match listener with backend target (e.g. HTTPS listen to TLS backend)
WebSockets Yes Yes No
IPv6 Yes No? - ELB has IPv4/6 and dualstack DNS name
- IPv6 addresses not supported in EC2-VPC (use Application) but IPv6 addresses supported in EC2-Classic
Proxy/Client IP X-Forwarded-For

Terminates connection and new connection to target
Preserves client IP if target is instance. Doesn’t terminate connection.

If IP is target, then source IP is load balancer IP (as IP could be outside VPC), so have to enable Proxy Protocol v2 which uses a HAProxy TCP header
X-Forwarded-For for HTTP Proxy Protocol v1 for TCP
Security Groups Yes No which means:
- can’t allow traffic to target by allowing SG of NLB
- instead, the backend app has to allow by client IP if NLB target is instance ID (which preserves source IP) or ELB IP (i.e. subnet for targets as IPs)
- backend app needs to allow healthcheck from ELB
Yes but EC2-Classic security groups different (you can’t pick the ELB’s security group, it has it’s own)
AZs / subnets At least 2 AZs 1 subnet per AZ per LB

Can change after creation
Single AZ allowed 1 subnet per AZ per LB One or more AZ 1 subnet per AZ per LB Need public subnets

Can change after creation
Static IP No Automatically allocate 1 (or elastic IP) per AZ. For firewall IP whitelisting No
Routing Features Simple rule-based routing (path/URL and host header).
Docs here.
Basic protocol & port routing. Docs here. Nothing. Docs here.
When you create listener you specify backend protocol and port.
Port forwarding Different ports per instance (useful for containers) Different ports per instance Global port per instance
Sticky sessions Yes via cookie (ALB-generated cookie only) or web sockets No Yes via cookie (generated by ALB of backend)
User authentication OpenID Connect compliant IdP, social (Amazon, Facebook, Google via Cognito), corporate (SAML, LDAP, Microsoft AD via Cognito) No No
WAF Yes No No
SSL termination Yes
Can re-encrypt to target but no client auth (AWS says already secure with VPC unlike shared Classic network)
Yes (TLS)
No re-encrypt or client auth, do TCP pass-through and terminate on backend target
Yes
Can re-encrypt and do client auth against backend if backend target is HTTPS
SNI support Yes. You provide ELB list of certs and default only used if client doesn’t support SNI. Yes. You provide ELB list of certs and default only used if client doesn’t support SNI. No. Either present one cert with Subject Alternative Name set, or handle TCP only and terminate at EC2 instance
Health check Active only at HTTP(S) level looking for HTTP 200-499 response Passive (can’t be disabled/configured, observe how target responds, not for UDP)

Active at HTTP(S) looking for 200-399 response or TCP level
Active only at TCP, HTTP(S), SSL

For TCP, must connect successfully
For HTTP(S), must return 200 OK
For SSL, must handshake successfully
EC2-Classic/VPC-Classic Support Not via instance ID. Peer VPC with ClassicLink (see EC2-Classic instances in same VPC region) and use private IP Yes Yes
Cost Partial hour and LCU Partial hour and LCU Partial hour and per GB processed
Gotchas Must be at least 2 AZs - Route to only one port per instance (if need more, need multiple Classic ELBs)
- No IP targets (no within network only)
- No ECS support
Limits

Global: 20 LB per region
3000 target groups per region (shared with Network)
5 security groups per LB
50 listeners per LB
1 subnet per AZ per LB
1000 targets per LB
1000 targets per target group
1 LB per target group
3000 target groups per region (shared with Application)
50 listeners per LB
1 subnet per AZ per LB
200 targets per LB per AZ
1 LB per target group
100 listeners per LB
5 security groups per LB
1 subnet per AZ per LB

Containers

LightSail BeanStalk ECS (EC2) ECS (FarGate) EKS
Summary For students, small business that need virtual compute. Template for Wordpress, LAMP Docker, Apache Tomcat, Apache + PHP, IIS +.NET

AWS controls deployment, capacity provisioning, scaling, load balancing
Docker Deep AWS integration Task-level orchestration Kubernetes-compliant, non-AWS, AWS manages control plane for you.

More AWS-independent. Own Route 53, ALB, CloudWatch equivalents
Pay EC2 EC2 Requested vCPU and memory Master node (control plane) hours, EC2
Control/Flexibility - Very Low Dumbed down console, push-button add ELB (auto-scaling).
- Limited AMIs and RDS options. Simplified pricing.
Low
Less control options than provisioning yourself (console has own screens and own API). Don’t control: security groups, ELB health checks, RDS (replicas).

Manual config changes applied to console higher precedence than .ebextensions in app bundle which makes console changes impossible to manage.

Some CloudFormation support (e.g. provision SQS)
Medium

Support task placement: binpack, random, spread (so in theory, better resource utilisation)

Can select EC2 GPU optimised tasks (not supported in Fargate)
Low High, lots of 3rd party add ons

Support task placement: binpack, random, spread
Persistent storage EBS EBS, EFS EBS, EFS EFS EBS, EFS
Access to underlying EC2 instances in console? No. Can convert LightSail to EC2 Yes Yes No Yes
Who does EC2 patching? You via SSH Either. Can opt-in to AWS patching platform for you.
Does blue-green so no downtime.
You N/A You
Scaling? AWS AWS via auto-scaling group AWS at service-level via auto-scaling groups you tweak AWS at service-level (multiple instances of tasks) AWS at pod level (co-located containers) via Horizontal Pod Auto-Scaler (HPA)
Load Balancer AWS via ELB (classic, network, application) ALB, NLB, Classic

ALB recommended: service can service multiple ALBs and ports, support dynamic port assignment (i.e. have >1 task on one node), path based routing so one ALB port can redirect to different services
ALB, NLB
Provision RDS? Yes Yes but probably bad for PROD since it ties RDS to app lifecycle. No, not about provisioning No, not about provisioning No, not about provisioning
Network ENI per container (could run into ENI limits per EC2 instance), allows security group per container ENI per task (which means IP) ENI per pod (more efficient, but looser security control)
Security IAM at container/task level IAM at task level IAM at worker/ec2 level
Supports App Mesh? Not during create but could add agent later. Not during create but could use custom AMI Yes Yes Yes

Compute Asides

Security and Identity

Security Asides

Deployment

Command Line

General Deployments

AWS CloudFormation

Management Tools

Infrastructure As Code Landscape

Comments Chef Puppet Ansible CloudFormation Terraform
Open Source? Yes Yes Yes No, AWS Yes
Config Management vs Provisioning Focus? If Docker/Packer, this encapsulates config so you need provisioning more Config management Config management Config management Provisioning Provisioning
Mutable vs Immutable Infrastructure Focus? Docker represents immutable (replace container with brand new container) vs mutable (in-place upgrade) Mutable Mutable Mutable Immutable Immutable
Procedural vs Declarative? Procedural code doesn’t encapsulate current state of infrastructure (it’s just changes, more complexity), order matters.

Declarative code represents current state of infrastructure. Smaller code but less expressive (harder templating logic)
Procedural

Recipies are Ruby DSL
Declarative Probably procedural (e.g. to scale in/out need to write code instead of just specify number needed)
Modules are more declarative though
Declarative Declarative
Master? Master centralises config, monitoring, can run continuously to enforce state. Master server by default Master server by default No master by default
You SSH from whatever machine
No master by default
Connect to cloud provider’s APIs
No master by default
Connect to cloud provider’s APIs
Agents? Having agents: how to bootstrap/upgrade/secure agents?
Purpose-built agents standardise libraries/environments
Yes Yes
State Management How state is interrogated and whether it’s persisted Chef Infra Client runs a recipie which defines the desired state and how to transition to this state Facter agent interrogates node sending facts to puppet master. Ansible using SSH/WinRM interrogates system to determine facts. Nodes can define custom facts in a file for Ansible to discovers. Handled by AWS itself Has a .tfstate file (need a backend to share it with team) which maps resources in configuration to the actually deployed resources, maintains depdendencies and knows if resource deleted. Allows Terraform to be cross-provider.

Combos:

AWS Config

Helps with ITIL configuration management compliance

AWS OpWorks

Systems Manager

AWS Service Catalog

Licensing

Cost Management

Terms:

Cost Minimisation Strategies

Resource Tagging

Tools

Non-AWS, is public cloud actually cheaper?

Application Integration

AWS SQS - Simple Queue Service

Reliably store messages on a queue while waiting for computer to process them, loosely coupled applications together, buffer in multi-producer/multi-consumer (which don’t need to coordinate* see limits below)

Standard Queue FIFO Queue
Performance Unlimited High 3000 messages per second with batching (300 without batching)
Ordering Best-effort FIFO
Delivery At-least-once Exactly-once (consumer has to delete it)
Max messages in-flight 120,000 20,000
Message failure Retries until retention period expires or sent to dead-letter queue Bad message blocks consumers. Can use dead-letter queue.

AWS SWF - Simple Workflow Service

Manage parallel or sequential services ( basically organizes tasks), glorified status tracking system (no drag-and-drop).

AWS Step Functions

Managed workflow/orchestration. Create tasks, sequential steps, parallel steps, branches and timers.

AWS API Gateway

AWS SNS - Simple Notification Service

Web service to set up, operate, send push notifications, pub-sub

Other things:

Business Applications and End User Computing

Just need to know they exist and what they do. Troubleshooting likely not covered.

Machine Learning

Need to be able to select right service for the exam. Overview:

Analytics

AWS Kinesis

Streaming, consuming/collecting/storing streamed, handling hundreds of thousands of producers (e.g. social media feed and find positive/negative views, stock prices, game data, geo data for maps, IoT sensors) and store/analyse it.

Data Streams Firehose Analytics
What Topic/stream ETL Real-time analytics. Code runs in Kinesis.
Cost - per shard-hour
- PUT units
- per GB ingested
- per GB transformed
- per GB and hour processed in a VPC (optional)
- per KPU/hour (compute unit)
- per GB-mo storage
Getting data in - KPL (Java with C++ module, actually a separate process)
- AWS SDK (low-level, KPL recommended over this)
- Kinesis Agent (Java app monitors files and sends them)
- Kinesis Data Streams
- Kinesis Analytics
- Kinesis Agent
- AWS SDK
- CloudWatch logs
- CloudWatch events
- AWS IoT

S3???
Via SQL:
- Kinesis Data Streams
- Kinesis Analytics

Via Java app (Apache Flink), highlights:
- Kinesis Data Streams
- Kafka
- Twitter streaming API
Getting data out - KCL (Java or Python (more languages in version 1.x), runs a Java daemon)
- AWS SDK (low-level)
- Kinesis Data Analytics
- Kinesis Firehose
- Lambda
- S3
- Kinesis Analytics (configured in Analytics)
- Redshift (via Firehose)
- AWS Elasticsearch service
- Splunk
Via SQL:
- Kinesis Data Streams
- Kinesis Firehose

Via Java App natively:
- S3
- Kinesis Data Streams
- Kinesis Firehose and via Apache Flink, highlights:
- Kinesis Data Streams
- Kafka
- Elasticsearch

Other

Analytics And Storage Asides

Front-End Web, Mobile, and IoT

Migration Services and Cloud Adoption Framework

What is a framework? Is: information to help organise thoughts, open for localisation/interpretation, should be adopted into organisational culture. Not a literal recipe for success.

Phases of Cloud Adoption:

Cloud Adoption Framework

Migration Tools

Network Migrations and Cutovers

Whitepaper and re:Invent

Migration whitepaper:

Migration readiness re:Invent 2017:

Whitepapers

Mutli-Tenant SaaS Storage Strategies

Architecting For Scale

Business Continuity

Example Architectures

Asides

AWS Play Notes