1. Overview

EC2 Auto Scaling automatically adjusts the number of EC2 instances in your application based on demand. It ensures you have the right number of instances running at all times — not too many (cost waste) and not too few (poor performance).

Core Concept Auto Scaling provides two key benefits: 1) Elasticity — automatically scale out (add instances) when demand increases and scale in (remove instances) when demand decreases. 2) High Availability — automatically replace unhealthy instances and maintain your desired instance count across AZs.

2. Auto Scaling Components


Component 1: Launch Template (or Launch Configuration)

Defines WHAT to launch. Contains the instance configuration:

  1. AMI ID
  2. Instance type
  3. Key pair
  4. Security Groups
  5. IAM Instance Profile (role)
  6. User Data (bootstrap script)
  7. EBS volume configuration
  8. Network settings
Launch Template vs Launch Configuration Launch Configuration = legacy, immutable, limited features. Launch Template = modern, versioned, supports mixed instance types, Spot, and advanced features. AWS recommends Launch Templates for all new setups. Launch Configurations cannot be edited — you must create a new one.

Component 2: Auto Scaling Group (ASG)

Defines WHERE and HOW MANY instances to run:

  1. Minimum Capacity: The minimum number of instances that must always be running. ASG will never scale below this.
  2. Desired Capacity: The target number of instances. ASG will try to maintain this count at all times.
  3. Maximum Capacity: The maximum number of instances allowed. ASG will never scale above this.
Example Configuration:

Minimum:  2  (always at least 2 running)
Desired:  4  (ASG targets 4 instances)
Maximum: 10  (never more than 10)

     Min        Desired         Max
      2 --------- 4 ----------- 10
      |           |              |
  scale in    current state   scale out


Component 3: Scaling Policies

Defines WHEN to scale. See section 2.4 for details.

3. Auto Scaling Group Key Features


Multi-AZ Deployment

  1. ASG distributes instances evenly across the configured AZs
  2. If one AZ goes down, ASG launches replacement instances in healthy AZs
  3. Best practice: use at least 2 AZs (ideally 3) for high availability


Health Checks

  1. ASG monitors instance health and automatically replaces unhealthy instances

Cooldown Period

  1. After a scaling activity, ASG waits for the cooldown period (default: 300 seconds / 5 minutes) before allowing another scaling action
  2. Prevents rapid fluctuations (scaling in and out repeatedly)
  3. During cooldown, ASG ignores additional alarms


Warm Pool

  1. A pool of pre-initialized instances that are stopped (or in a running/hibernated state)
  2. When ASG needs to scale out, it pulls from the warm pool instead of launching cold
  3. Reduces launch time significantly (instances are already initialized)
  4. You pay reduced costs for stopped instances (only EBS storage charges)


Instance Refresh

  1. Automatically replaces instances when you update the Launch Template
  2. Rolls out changes gradually (configurable percentage at a time)
  3. Supports a minimum healthy percentage to maintain availability during updates
  4. Use for: OS patching, AMI updates, configuration changes

4. Scaling Policies


1. Manual Scaling

  1. You manually change the desired capacity
  2. ASG adds or removes instances to match
  3. No automation — useful for planned events


2. Scheduled Scaling

  1. Scale based on a schedule (e.g., scale up at 8 AM, scale down at 6 PM)
  2. Uses cron-like expressions or specific dates/times
  3. Ideal for predictable traffic patterns

Example: Scale up for business hours. Min: 10, Desired: 10 at 08:00 UTC Mon-FriMin: 2, Desired: 2 at 18:00 UTC Mon-Fri

3. Dynamic Scaling — Target Tracking (Recommended)

  1. You set a target metric, and ASG automatically adjusts to maintain it
  2. Simplest and most commonly used policy
  3. ASG creates and manages the CloudWatch alarms automatically

Example: "Keep average CPU utilization at 50%." If the CPU goes above 50%, ASG adds instances. If CPU drops below 50%, ASG removes instances.


Common Target Tracking Metrics

4. Dynamic Scaling — Step Scaling

  1. You define CloudWatch alarms and specify how many instances to add/remove at each step
  2. More granular control than Target Tracking
  3. Multiple steps: e.g., add 1 instance if CPU > 60%, add 3 instances if CPU > 80%


5. Dynamic Scaling — Simple Scaling (Legacy)

  1. Triggered by a single CloudWatch alarm
  2. Waits for the entire cooldown period before responding to additional alarms
  3. Less responsive than Step Scaling — not recommended for new setups


6. Predictive Scaling

  1. Uses machine learning to analyze historical traffic patterns
  2. Predicts future demand and pre-scales BEFORE traffic arrives
  3. Works well with cyclical/predictable workloads
  4. Can be combined with dynamic scaling for comprehensive coverage

5. Scaling Policies Comparison

6. Scaling Termination Policy

When ASG needs to scale in (remove instances), it follows a termination policy to decide which instance to terminate:

  1. Default Policy:
  2. 1) Select the AZ with the most instances.
  3. 2) Among those, terminate the instance with the oldest launch configuration/template.
  4. 3) If tied, terminate the instance closest to the next billing hour.


Other Termination Policies

  1. OldestInstance: Terminate the oldest instance (useful when upgrading instance types)
  2. NewestInstance: Terminate the newest instance
  3. OldestLaunchConfiguration: Terminate instance with the oldest launch configuration
  4. OldestLaunchTemplate: Terminate instance with the oldest launch template version
  5. ClosestToNextInstanceHour: Terminate instance closest to the next billing hour (cost optimization)
  6. AllocationStrategy: Used with mixed instance types / Spot (terminate based on allocation strategy)

7. Lifecycle Hooks

  1. Allow you to perform custom actions when instances launch or terminate
  2. Instance enters a "Pending:Wait" state on launch or "Terminating:Wait" state on termination
  3. During the wait state, you can: install extra software, pull logs before termination, register with external systems, run health checks
  4. Default wait: 1 hour (configurable up to 48 hours)
  5. Integrates with EventBridge, SNS, and SQS for notifications

8. When to use

Use EC2 Auto Scaling when you need to automatically add or remove EC2 instances based on demand, health, or schedule.

Common scenarios:

  1. Handle traffic spikes — Automatically scale out when CPU, memory, or request count increases.
  2. Cost optimization — Scale in during low-traffic periods to avoid paying for idle instances.
  3. High availability — Replace unhealthy instances automatically.
  4. Predictable workloads — Schedule scaling actions for known traffic patterns (e.g., business hours, weekends).
  5. Maintain minimum capacity — Always keep a minimum number of instances running.


Exam Tip Auto Scaling questions: "Maintain CPU at 40%" = Target Tracking policy. "Scale up at 9 AM every weekday" = Scheduled Scaling. "ML-based prediction" = Predictive Scaling. "Replace instances gradually after AMI update" = Instance Refresh. Launch Template is always preferred over Launch Configuration. ASG + ALB = the most common HA architecture pattern. Cooldown prevents rapid scale oscillation.