Claude Fable 5$22.000/MClaude Opus 4.8$11.000/MClaude Opus 4.7$11.000/MClaude Opus 4.6$11.000/MClaude Opus 4.5$33.000/MClaude Sonnet 3.7$6.600/MClaude Opus 3$33.000/MClaude 2.1$12.800/MClaude 2$12.800/MGPT-5.5$12.500/MGPT-5.2$5.425/MGPT-5.2-Codex$5.425/MGPT-5$3.875/MGPT-4.5$97.500/MGPT-4 Turbo Preview$16.000/MGPT-4$39.000/MGPT-4-32k$78.000/Mo3$19.000/Mo3-mini$2.090/Mo4-mini$2.090/Mo1$28.500/Mo1-mini$5.700/Mo1-preview$28.500/MGemini 3.5 Pro$5.000/MGemini 3.1 Pro$5.000/MGemini 3 Pro$5.000/MGemini 2.5 Pro$3.875/MGemini 1.5 Pro$2.375/MGemini 1.0 Ultra$12.000/MGemini 1.0 Pro$0.800/MClaude Fable 5$22.000/MClaude Opus 4.8$11.000/MClaude Opus 4.7$11.000/MClaude Opus 4.6$11.000/MClaude Opus 4.5$33.000/MClaude Sonnet 3.7$6.600/MClaude Opus 3$33.000/MClaude 2.1$12.800/MClaude 2$12.800/MGPT-5.5$12.500/MGPT-5.2$5.425/MGPT-5.2-Codex$5.425/MGPT-5$3.875/MGPT-4.5$97.500/MGPT-4 Turbo Preview$16.000/MGPT-4$39.000/MGPT-4-32k$78.000/Mo3$19.000/Mo3-mini$2.090/Mo4-mini$2.090/Mo1$28.500/Mo1-mini$5.700/Mo1-preview$28.500/MGemini 3.5 Pro$5.000/MGemini 3.1 Pro$5.000/MGemini 3 Pro$5.000/MGemini 2.5 Pro$3.875/MGemini 1.5 Pro$2.375/MGemini 1.0 Ultra$12.000/MGemini 1.0 Pro$0.800/M

DeepInfra

DeepInfra is an AI model provider.Tokenando tracks 11 DeepInfra models, with input pricing from $0.060/M and an average blended cost of $0.459/M. Its flagship model is Qwen2.5-72B (DI).

GPU serverless for popular open-weights models at aggressive prices. OpenAI-compatible API surface.

Founded 2022HQ Menlo Park, USAWebsite ↗Official pricing ↗API docs ↗

MODELS TRACKED

3 categories

FLAGSHIP

Qwen2.5-72B (DI)

Live API

MIN INPUT

$0.060/M

cheapest model in family

AVG BLENDED

$0.459/M

across 11 priced models

MAX CONTEXT

128K

largest window in family

Frontier

1 model

Llama 3.1 405B (DI)profile

Live API · 128K ctx

in $0.800/Mout $0.800/M

405B serverless · manual-seed

Reasoning

1 model

DeepSeek-R1 (DI)profile

Live API · 64K ctx

in $0.550/Mout $2.190/M

R1 managed · manual-seed

Efficient

9 models

Llama 3.3 70B (DI)profile

Live API · 128K ctx

in $0.230/Mout $0.400/M

Cost-effective · manual-seed

Llama 3.1 70B (DI)profile

Live API · 128K ctx

in $0.350/Mout $0.400/M

70B managed · manual-seed

Mistral 7B (DI)profile

Live API · 32K ctx

in $0.070/Mout $0.070/M

Cheapest Mistral · manual-seed

Mixtral 8x7B (DI)profile

Live API · 32K ctx

in $0.240/Mout $0.240/M

MoE managed · manual-seed

Qwen2.5-72B (DI)profile

Live API · 128K ctx

in $0.350/Mout $0.400/M

Qwen on DeepInfra · manual-seed

Gemma 2 9B (DI)profile

Live API · 8K ctx

in $0.060/Mout $0.060/M

Cheapest Gemma · manual-seed

Yi-34B-Chat (DI)profile

Live API · 4K ctx

in $0.600/Mout $0.600/M

Yi on DeepInfra · manual-seed

WizardLM-2 8x22B (DI)profile

Live API · 64K ctx

in $0.630/Mout $0.630/M

Instruction tuned · manual-seed

Phind-CodeLlama-34B (DI)profile

Live API · 16K ctx

in $0.600/Mout $0.600/M

Code model · manual-seed

Frequently Asked Questions

How many models does DeepInfra offer?

Tokenando tracks 11 DeepInfra models.

How much do DeepInfra models cost?

DeepInfra model input pricing starts at $0.060 per million tokens, with an average blended cost of $0.459 per million across the 11 priced models we track.

What is DeepInfra's flagship model?

DeepInfra's flagship model is Qwen2.5-72B (DI). It's the highest-tier DeepInfra model we track, with input pricing of $0.350 per million tokens.

What model categories does DeepInfra cover?

DeepInfra covers 3 categories: frontier, reasoning and efficient.