GPU pricing across providers in 2026 (H100, A100, L4 ranked)
Hourly H100, A100, L4, and consumer-grade GPU pricing across hyperscalers and specialists. Where ML training is actually cheapest in 2026, ranked.
GPU pricing is the most chaotic corner of cloud pricing. List prices vary by 3-5x between providers for what is nominally the same hardware. Spot/preemptible pricing varies another 2-3x. Multi-year reservations are negotiable.
Here's what an H100, A100, L4, and consumer-grade rental actually cost in 2026, ranked from cheapest to most expensive on demand.
H100 80GB — the flagship training GPU
| Provider | $/hr on-demand | Notes |
|---|---|---|
| Lambda Labs | $2.49 (single H100 PCIe) | Most demand-constrained provider |
| RunPod | $2.79 (community), $3.49 (secure) | Marketplace model |
| Vultr | $2.99 (H100 PCIe) | Available in EU and US |
| CoreWeave | $3.20-$4.25 | Enterprise contracts |
| GCP A3 High (H100 SXM) | $5.45 per H100 in an 8x | SXM with NVLink, premium for it |
| AWS p5.48xlarge (H100 SXM) | $98.32 / 8 = $12.29 per H100 | Full 8x SXM instance |
| Azure ND H100 v5 | ~$11-13 per H100 | 8x SXM with InfiniBand |
The hyperscalers are 3-5x more expensive than the specialist providers on H100. Part of this is honest — AWS, GCP, and Azure provide the H100s in 8x SXM clusters with InfiniBand for distributed training, which is genuinely more expensive hardware to provision. But part of it is just hyperscaler margin.
For ML training where you can manage your own networking: Lambda, RunPod, or CoreWeave. For workloads that need to live next to your existing data and managed services: bite the cost and stay on the hyperscaler.
A100 80GB — still the workhorse
| Provider | $/hr on-demand |
|---|---|
| Lambda Labs | $1.29 (PCIe) |
| RunPod | $1.19 (community) / $1.79 (secure) |
| Vultr | $1.50 |
| GCP A2 Ultra (A100 80GB SXM) | $5.07 per A100 in an 8x |
| AWS p4de.24xlarge (A100 80GB SXM) | $40.97 / 8 = $5.12 per A100 |
| Azure NDm A100 v4 | ~$5-6 per A100 |
A100 spot pricing on hyperscalers can be 60-70% off — sometimes lands at $1.50-2.00/hour, competitive with specialist providers on demand.
L4 — the inference workhorse for 2025-2026
L4 has become the default for cost-efficient inference. 24GB VRAM, lower power than A100, much cheaper per hour.
| Provider | $/hr on-demand |
|---|---|
| GCP G2 (L4) | $0.71 per L4 in g2-standard-4 |
| AWS G6 (L4) | $0.81 per L4 in g6.xlarge |
| Vultr | $0.50 |
| RunPod | $0.43 (community) / $0.64 (secure) |
L4 is the right choice for most inference workloads in 2026 that don't need 80GB of VRAM. For Llama 3 70B inference at INT4, L4s in a 4-way config work well. For Stable Diffusion XL or smaller, a single L4 is plenty.
Consumer-grade GPUs for prototyping
RunPod, Vast.ai, and Paperspace all offer RTX 4090s, 3090s, and A6000s at fractional rates. RTX 4090 around $0.40-0.70/hour. Useful for prototyping, fine-tuning small models, or running inference on quantised models. Not for production workloads requiring SLAs.
The reservation / commit story
AWS Capacity Blocks for ML let you reserve H100 capacity for 1-14 days at fixed prices. Useful for time-boxed training runs. Pricing similar to on-demand but with capacity certainty.
3-year RIs on H100 instances on AWS / GCP / Azure run around 35-50% off. If you have a sustained training pipeline, the math is real, but committing 3 years to a specific GPU generation in 2026 (with B100 / Blackwell launching) is risky.
What this looks like for a training run
Training a 70B parameter model from scratch: roughly 1.4 million GPU-hours on H100 equivalent. At hyperscaler pricing ($12/hour), that's $16.8 million. At specialist pricing ($2.50/hour), $3.5 million. The 5x cost gap is the entire P&L difference between "this is a viable startup project" and "you need a major-cloud commitment".
This is why every serious ML startup either:
- Built their own DC (Cerebras, Tenstorrent),
- Has a multi-year contract with a specialist (Lambda, CoreWeave, Crusoe), or
- Has a co-investment / credits deal with Microsoft, AWS, or Google.
Beyond NVIDIA
AMD MI300X is starting to show up on AWS and Azure. Roughly 2-3x cheaper per FLOP than H100 at on-demand prices. Software stack (ROCm) has improved enormously but still trails CUDA. Worth evaluating for inference; risky for training pipelines that depend on PyTorch / cuDNN optimizations.
Google TPU v5p is competitive on cost for workloads where JAX / TensorFlow are the primary frameworks. Locked to GCP.
AWS Trainium and Inferentia are the cheapest pure-AWS option for inference at scale, but the framework support gap versus NVIDIA is real.
What to put in your pricing model
The cloudprice catalogue lists the major GPU SKUs from each provider. For a real training cost model, layer the following on top of the hourly rate:
- Storage: training data and checkpoints need fast access. Don't pay for io2 when local NVMe does the job for free.
- Egress: model weights and checkpoints leaving the cloud is non-trivial. See the egress trap.
- Inter-node bandwidth: InfiniBand vs ethernet matters enormously for distributed training. Sometimes a slightly more expensive instance with InfiniBand reduces total training time enough to pay back.
External: Lambda GPU Cloud pricing, AWS P5, and SemiAnalysis for detailed GPU economics analysis.