Pricing built for growth.
Production inference that won't break your product or your bank.
Serverless
Per-token access to 2M+ open models. Call an endpoint and go — zero setup.
Dedicated serverless
Private, optimized endpoints priced per million tokens.
Dedicated
Reserved GPUs sized to your roadmap, with negotiated SLAs.
Batch
High-throughput offline jobs at the lowest per-token rate, on spare fleet capacity.
Per-token model pricing
Pay only for what you use. No minimums, no rate limits.
Reserved GPU pricing
Hourly price per GPU, billed by the minute. Volume discounts available.
Hourly rates available on request · volume discounts available
Self-service batch pricing
Best pricing and speed for your largest jobs. We find a unique fleet configuration, including spot instances, for the best value — a discount on serverless rates, plus an additional discount for cached tokens.
Models with special batch pricing
Priced per 1M tokens · discounted from serverless rates