Pricing built for growth.

Production inference that won't break your product or your bank.

Serverless

Per-token access to 2M+ open models. Call an endpoint and go — zero setup.

pay as you go

2M+ open models

No minimums

Community support

Get started

Early Access

Dedicated serverless

Private, optimized endpoints priced per million tokens.

Per million tokens

Private optimized endpoints

Per-workload tuning

Dedicated support

Talk to sales

Dedicated

Reserved GPUs sized to your roadmap, with negotiated SLAs.

Reserved capacity

Reserved GPUs, sized to you

Negotiated latency SLA

Dedicated support

Talk to sales

Batch

High-throughput offline jobs at the lowest per-token rate, on spare fleet capacity.

Lowest rate

Lowest per-token rate

Millions of requests per job

Great for evals & embeddings

Get started

Serverless

Per-token model pricing

Pay only for what you use. No minimums, no rate limits.

Price per1M tokens

Model

Input

Output

Cache read

GLM-5.2

$1.40

$4.40

$0.26

GLM-5.1

$1.40

$4.40

$0.26

GLM-5

$1.00

$3.20

$0.20

MiniMax M3

$0.30

$1.20

$0.06

MiniMax M2.5

$0.30

$1.20

$0.03

DeepSeek V4 Pro

$1.74

$3.48

$0.10

DeepSeek V4 Flash

$0.14

$0.28

$0.07

Kimi K2.7 Code

$0.75

$3.50

$0.16

Kimi K2.6

$0.75

$3.50

$0.16

Qwen3.6 35B-A3B

$0.15

$1.00

$0.05

Qwen3.5 397B-A17B

$0.50

$3.60

$0.30

Qwen3.5 35B-A3B

$0.15

$1.00

$0.05

Qwen3-Coder-Next

$0.12

$0.80

$0.07

Qwen3-VL 235B-A22B

$0.21

$1.90

$0.10

Qwen3-VL 8B

$0.25

$0.75

$0.12

Qwen3-Next 80B

$0.10

$1.10

$0.07

Qwen3 235B-A22B (2507)

$0.14

$0.80

$0.05

Qwen2.5-VL 72B

$0.80

$1.00

$0.40

Mistral Small 3.2 24B

$0.09

$0.30

$0.05

Llama 4 Maverick (FP8)

$0.35

$1.00

$0.17

Llama 3.3 70B (FP8)

$0.22

$0.50

$0.11

Nemotron 3 Ultra 550B (NVFP4)

$0.50

$2.50

$0.10

MiMo v2.5

$0.14

$0.28

$0.05

Resemble TTS (English)

$18.50

—

Trinity Large (Thinking)

$0.22

$0.85

$0.06

Gemma 4 26B-A4B

$0.13

$0.40

$0.05

Gemma 4 31B

$0.15

$0.40

$0.06

Skyfall 31B v4.2

$0.55

$0.80

$0.25

Cydonia 24B v4.1

$0.30

$0.50

$0.15

gpt-oss-120b

$0.10

$0.75

$0.055

gpt-oss-120b (Fast)

$0.15

$0.60

—

gpt-oss-20b

$0.04

$0.20

$0.02

UI-TARS 1.5 7B

$0.10

$0.20

$0.10

Gemma 3 27B

$0.08

$0.45

$0.04

Skyfall 36B v2 (FP8)

$0.55

$0.80

$0.25

BGE-M3

$0.01

—

Dedicated deployments

Reserved GPU pricing

Hourly price per GPU, billed by the minute. Volume discounts available.

Hardware

Memory

NVIDIA B300

288 GiB

Get a quote

NVIDIA B200

192 GiB

Get a quote

NVIDIA H200

141 GiB

Get a quote

NVIDIA H100 SXM

80 GiB

Get a quote

NVIDIA RTX PRO 6000

96 GiB

Get a quote

NVIDIA RTX 5090

32 GiB

Get a quote

Hourly rates available on request · volume discounts available

Batch

Self-service batch pricing

Best pricing and speed for your largest jobs. We find a unique fleet configuration, including spot instances, for the best value — a discount on serverless rates, plus an additional discount for cached tokens.

Precision

Price per 1M tokens

Model size

Input

Output

Cache read

0 – 4.1B params

$0.02

$0.04

$0.01

4.1B – 8.1B params

$0.03

$0.06

$0.01

8.1B – 16.1B params

$0.04

$0.10

$0.02

16.1B – 21.1B params

$0.05

$0.15

$0.02

21.1B – 41.1B params

$0.07

$0.22

$0.03

41.1B – 80.1B params

$0.13

$0.35

$0.06

80.1B – 150.1B params

$0.17

$0.47

$0.09

150.1B – 250.1B params

$0.22

$0.60

$0.11

250.1B – 500B params

$0.05

$0.25

$0.20

500B+ params

$0.52

$1.35

$0.26

Model size

Input

Output

Cache read

0 – 4.1B params

$0.03

$0.05

$0.01

4.1B – 8.1B params

$0.04

$0.09

$0.02

8.1B – 16.1B params

$0.05

$0.14

$0.03

16.1B – 21.1B params

$0.07

$0.22

$0.03

21.1B – 41.1B params

$0.10

$0.32

$0.05

41.1B – 80.1B params

$0.18

$0.50

$0.09

80.1B – 150.1B params

$0.25

$0.68

$0.12

150.1B – 250.1B params

$0.32

$0.86

$0.20

250.1B – 500B params

$0.57

$1.55

$0.29

500B+ params

$0.74

$1.93

$0.37

Model size

Input

Output

Cache read

0 – 4.1B params

$0.04

$0.07

$0.02

4.1B – 8.1B params

$0.05

$0.12

$0.02

8.1B – 16.1B params

$0.07

$0.19

$0.04

16.1B – 21.1B params

$0.09

$0.28

$0.04

21.1B – 41.1B params

$0.13

$0.41

$0.06

41.1B – 80.1B params

$0.23

$0.64

$0.12

80.1B – 150.1B params

$0.32

$0.88

$0.16

150.1B – 250.1B params

$0.41

$1.11

$0.20

250.1B – 500B params

$0.75

$2.02

$0.37

500B+ params

$0.97

$2.51

$0.48

Models with special batch pricing

Model

Input

Output

Cache read

Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

$0.40

—

Qwen/Qwen3.5-397B-A17B-FP8

$0.25

$1.80

—

deepseek-ai/DeepSeek-R1-0528

$2.00

—

google/gemma-4-31B-it

$0.07

$0.20

—

Priced per 1M tokens · discounted from serverless rates

Pricing built for growth.

Serverless

Dedicated serverless

Dedicated

Batch

Per-token model pricing

Reserved GPU pricing

Self-service batch pricing

Models with special batch pricing

Start building today