› Author

Gabriel Perácio

Gabriel Peracio is a staff engineer at Parasail, specializing in ML inference optimization. He has been fascinated by language models since a herd of unicorns spoke perfect English in 2019.

The idle GPU tax: What it is, why it’s getting worse, and how you can fix it

Learn what the idle GPU tax is, what it costs, and how usage-based billing on dedicated endpoints helps you avoid it altogether.

Gabriel Perácio · June 18, 2026

Engineering

How to choose the right managed inference architecture: Serverless, dedicated, dedicated serverless, or batch

Use this decision framework to choose the right managed inference mode based on latency requirements, GPU breakeven utilization, and whether your workload needs a dedicated endpoint.

Gabriel Perácio · June 9, 2026

Product

Serverless vs. Dedicated Inference: Why We Built Dedicated Serverless

With dedicated serverless you get dedicated hardware on per-token pricing, no idle-hour charges or long-term GPU commitment.

Gabriel Perácio · May 29, 2026

Product

Making an EAGLE fly: How We Got 2.6x Faster LLM Inference (Without Cheating)

We trained a custom EAGLE-3 speculative decoding head for OLMo-3.1-32B-Think and got 2.6x faster inference.

Gabriel Perácio · April 28, 2026

Engineering