Cloud cost engineering

Spot instance economics: when interruption is fine

Spot at 70-90% off On-Demand looks irresistible. The interruption rate isn't random — diversify across instance types and the math works for far more workloads than people think.

May 21, 2026 cloudprice editorial ~4 min read

Spot Instances (AWS), Spot VMs (Azure), and Preemptible / Spot VMs (GCP) all do the same thing: sell you unused capacity at a 60-90% discount, with the catch that the provider can take it back with little warning. AWS gives you 2 minutes, GCP gives you 30 seconds, Azure gives you 30 seconds.

The instinctive reaction is "too risky for production". The actual data — once you understand how interruption works — is more nuanced.

The headline numbers

A spot snapshot from us-east-1 in 2026:

Instance	On-demand	Spot	Discount
m6i.large	$0.096/hr	$0.029-0.041/hr	~60-70%
c6i.xlarge	$0.17/hr	$0.051-0.077/hr	~55-70%
m7g.xlarge	$0.1632/hr	$0.049-0.082/hr	~50-70%
g5.xlarge (GPU)	$1.006/hr	$0.30-0.50/hr	~50-70%
r6i.4xlarge (memory)	$1.008/hr	$0.28-0.45/hr	~55-72%

The discount is biggest in less-popular instance generations and largest sizes. m4.16xlarge spot is often 85-90% off. m6i.large spot is more like 60% off because everyone wants the small modern instance.

Interruption rates — actually measured

AWS publishes the Spot Instance Advisor, which buckets every (instance, region) pair into 0-5%, 5-10%, 10-15%, 15-20%, >20% monthly interruption rates.

From recent snapshots:

Modern, popular general-purpose families in major regions: m6i.large in us-east-1 is <5% interruption rate. c6i.xlarge is the same. m7g.large <5%.
Older generations: m4.large in us-east-1 often <5%, but in smaller regions can hit 10-15%.
GPU instances: Highly variable. g5.xlarge sometimes <5%, sometimes 15-20%, depending on overall demand at that moment.
Newest generations on launch: Often 20%+ for the first few months, because not much spare capacity yet.

The diversification strategy

Single instance type, single AZ, single spot pool = bad idea. Mix across 5-10 instance types in 2-3 AZs, use EC2 Auto Scaling Group's "capacity-optimized" allocation strategy, and effective interruption rate plummets toward zero. The ASG picks pools with the most spare capacity at any moment.

A worked example: an ASG running stateless workers, mixed across m6i.large, m6a.large, m6i.xlarge (with weights so a larger instance handles 2 jobs), and m7g.large (for ARM-compatible builds), across 3 AZs. Effective interruption rate: well under 1%. Effective discount versus on-demand: about 65%.

Workloads where Spot is the right answer

Stateless web workers / API workers behind a load balancer. Worker disappears, load balancer routes around it, ASG replaces it.
CI/CD runners. A self-hosted GitHub Actions or GitLab Runner on Spot is the cheapest possible build farm. If a job fails due to interruption, just retry. Most CI systems handle this natively.
Batch processing. ETL jobs, video encoding, ML training (with checkpointing). Spot was literally designed for this. Some Karpenter / Kubernetes setups achieve 80-90% Spot usage with sub-1% interruption visibility to the application.
Big-data / Spark / EMR. EMR has first-class Spot support including capacity-optimised allocation and "task" nodes that aren't on the critical path.
Dev / staging environments. Even databases — if the dev DB dies, restart from a backup. It's dev.

Workloads where Spot is wrong

Stateful primary databases. An RDS-style primary failover is operationally painful and customer-visible. Run primaries on On-Demand or RIs.
Message brokers with persistent state. Kafka brokers on Spot is asking for split-brain.
Long-running compute that doesn't checkpoint. A 6-hour training job that can't resume from a checkpoint is going to lose a lot of work to a single interruption.
Real-time / WebSocket servers with sticky sessions. Customer-visible disconnects on every Spot reclaim.
Anything that bakes IP allocation into application logic. Spot replacements get new IPs.

Karpenter is the right tool

For Kubernetes, Karpenter is the answer. It can mix Spot and On-Demand, respect pod-disruption budgets, drain nodes when AWS sends the interruption notice, and pick instance types from a large pool. Most production EKS clusters using Karpenter run 60-80% Spot with no operational pain.

What about GCP and Azure?

GCP Spot VMs have a flat 60-91% discount (depends on instance type). Preemption can happen anytime, and the 30-second notice is shorter than AWS's 2 minutes — design accordingly.

Azure Spot VMs have similar discounts. Eviction can be based on price or capacity; you pick the policy. Notice is also 30 seconds.

The hidden risk: capacity events

Once or twice a year, AWS has a regional capacity event (huge AI launch, big regional outage recovery) and Spot prices spike to On-Demand or near it. ASGs configured to "lowest price" silently fall over. The fix is to set a max-price ceiling and to have an On-Demand backstop in the same ASG.

Effective savings, honest

A well-diversified Spot ASG covering 60-80% of a stateless compute footprint, with the rest on Compute Savings Plans, delivers an effective compute discount of 50-65% versus pure On-Demand. That's about as good as any commitment strategy gets.

For instance-level pricing including Spot bands, see the cloudprice catalogue — every AWS row notes the spot range. Cross-check against AWS vs GCP if you're choosing between hyperscaler spot offerings.

Try it yourself

Compare list prices across all seven providers, side by side. Live snapshot updated regularly.

Open comparison → TCO calculator