Pricing built for growth.

Production inference that won't break your product or your bank.

Serverless

Per-token access to 2M+ open models. Call an endpoint and go — zero setup.

pay as you go
2M+ open models
No minimums
Community support

Dedicated

Reserved GPUs sized to your roadmap, with negotiated SLAs.

Reserved capacity
Reserved GPUs, sized to you
Negotiated latency SLA
Dedicated support

Batch

High-throughput offline jobs at the lowest per-token rate, on spare fleet capacity.

Lowest rate
Lowest per-token rate
Millions of requests per job
Great for evals & embeddings
Serverless

Per-token model pricing

Pay only for what you use. No minimums, no rate limits.

Price per1M tokens
Model
Input
Output
Cache read
GLM-5.2
$1.40
$4.40
$0.26
GLM-5.1
$1.40
$4.40
$0.26
GLM-5
$1.00
$3.20
$0.20
MiniMax M3
$0.30
$1.20
$0.06
MiniMax M2.5
$0.30
$1.20
$0.03
DeepSeek V4 Pro
$1.74
$3.48
$0.10
DeepSeek V4 Flash
$0.14
$0.28
$0.07
Kimi K2.7 Code
$0.75
$3.50
$0.16
Kimi K2.6
$0.75
$3.50
$0.16
Qwen3.6 35B-A3B
$0.15
$1.00
$0.05
Qwen3.5 397B-A17B
$0.50
$3.60
$0.30
Qwen3.5 35B-A3B
$0.15
$1.00
$0.05
Qwen3-Coder-Next
$0.12
$0.80
$0.07
Qwen3-VL 235B-A22B
$0.21
$1.90
$0.10
Qwen3-VL 8B
$0.25
$0.75
$0.12
Qwen3-Next 80B
$0.10
$1.10
$0.07
Qwen3 235B-A22B (2507)
$0.14
$0.80
$0.05
Qwen2.5-VL 72B
$0.80
$1.00
$0.40
Mistral Small 3.2 24B
$0.09
$0.30
$0.05
Llama 4 Maverick (FP8)
$0.35
$1.00
$0.17
Llama 3.3 70B (FP8)
$0.22
$0.50
$0.11
Nemotron 3 Ultra 550B (NVFP4)
$0.50
$2.50
$0.10
MiMo v2.5
$0.14
$0.28
$0.05
Resemble TTS (English)
$18.50
Trinity Large (Thinking)
$0.22
$0.85
$0.06
Gemma 4 26B-A4B
$0.13
$0.40
$0.05
Gemma 4 31B
$0.15
$0.40
$0.06
Skyfall 31B v4.2
$0.55
$0.80
$0.25
Cydonia 24B v4.1
$0.30
$0.50
$0.15
gpt-oss-120b
$0.10
$0.75
$0.055
gpt-oss-120b (Fast)
$0.15
$0.60
gpt-oss-20b
$0.04
$0.20
$0.02
UI-TARS 1.5 7B
$0.10
$0.20
$0.10
Gemma 3 27B
$0.08
$0.45
$0.04
Skyfall 36B v2 (FP8)
$0.55
$0.80
$0.25
BGE-M3
$0.01
Dedicated deployments

Reserved GPU pricing

Hourly price per GPU, billed by the minute. Volume discounts available.

Hardware
Memory
NVIDIA B300
288 GiB
NVIDIA B200
192 GiB
NVIDIA H200
141 GiB
NVIDIA H100 SXM
80 GiB
NVIDIA RTX PRO 6000
96 GiB
NVIDIA RTX 5090
32 GiB

Hourly rates available on request · volume discounts available

Batch

Self-service batch pricing

Best pricing and speed for your largest jobs. We find a unique fleet configuration, including spot instances, for the best value — a discount on serverless rates, plus an additional discount for cached tokens.

Precision
Price per 1M tokens
Model size
Input
Output
Cache read
0 – 4.1B params
$0.02
$0.04
$0.01
4.1B – 8.1B params
$0.03
$0.06
$0.01
8.1B – 16.1B params
$0.04
$0.10
$0.02
16.1B – 21.1B params
$0.05
$0.15
$0.02
21.1B – 41.1B params
$0.07
$0.22
$0.03
41.1B – 80.1B params
$0.13
$0.35
$0.06
80.1B – 150.1B params
$0.17
$0.47
$0.09
150.1B – 250.1B params
$0.22
$0.60
$0.11
250.1B – 500B params
$0.05
$0.25
$0.20
500B+ params
$0.52
$1.35
$0.26
Model size
Input
Output
Cache read
0 – 4.1B params
$0.03
$0.05
$0.01
4.1B – 8.1B params
$0.04
$0.09
$0.02
8.1B – 16.1B params
$0.05
$0.14
$0.03
16.1B – 21.1B params
$0.07
$0.22
$0.03
21.1B – 41.1B params
$0.10
$0.32
$0.05
41.1B – 80.1B params
$0.18
$0.50
$0.09
80.1B – 150.1B params
$0.25
$0.68
$0.12
150.1B – 250.1B params
$0.32
$0.86
$0.20
250.1B – 500B params
$0.57
$1.55
$0.29
500B+ params
$0.74
$1.93
$0.37
Model size
Input
Output
Cache read
0 – 4.1B params
$0.04
$0.07
$0.02
4.1B – 8.1B params
$0.05
$0.12
$0.02
8.1B – 16.1B params
$0.07
$0.19
$0.04
16.1B – 21.1B params
$0.09
$0.28
$0.04
21.1B – 41.1B params
$0.13
$0.41
$0.06
41.1B – 80.1B params
$0.23
$0.64
$0.12
80.1B – 150.1B params
$0.32
$0.88
$0.16
150.1B – 250.1B params
$0.41
$1.11
$0.20
250.1B – 500B params
$0.75
$2.02
$0.37
500B+ params
$0.97
$2.51
$0.48

Models with special batch pricing

Model
Input
Output
Cache read
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
$0.40
$0.40
Qwen/Qwen3.5-397B-A17B-FP8
$0.25
$1.80
deepseek-ai/DeepSeek-R1-0528
$2.00
$2.00
google/gemma-4-31B-it
$0.07
$0.20

Priced per 1M tokens · discounted from serverless rates

Start building today

Instantly run any open model — popular or specialized.