Here's a number worth sitting with: the cheapest production-grade AI model right now costs $0.10 per million input tokens. The most expensive costs $5.00. Same task, same prompt, wildly different bill. If you're running 100,000 API calls a day and picked the wrong model, you're potentially burning $350,000 a year more than necessary — not because the expensive model is better for your use case, just because you never ran the comparison. This guide gives you the actual numbers and helps you stop overpaying.
Every major AI provider charges per token — a chunk of approximately 3–4 characters of text. Your payment depends on two things: how many tokens you send in (your prompt, system instructions, conversation history) and how many tokens the model sends back (its response).
Output tokens are consistently more expensive than input tokens — typically 3–5x — because generating text is computationally harder than processing it. This means response length dramatically affects your bill.
The basic formula:
Cost = (input_tokens / 1,000,000 × input_price) + (output_tokens / 1,000,000 × output_price)
| Model | Provider | Input / 1M | Output / 1M | Context |
|---|---|---|---|---|
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 1M |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 200k |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200k |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 270k |
| GPT-5.4 mini | OpenAI | $0.75 | $4.50 | 270k |
| GPT-5.4 nano | OpenAI | $0.20 | $1.25 | 270k |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128k |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M |
Prices verified April 2026. AI pricing changes frequently — always verify current rates at anthropic.com, openai.com, and ai.google.dev.
Abstract per-token prices don't mean much without context. Here are three realistic use cases:
| Model | Cost per Request | Monthly Cost |
|---|---|---|
| Gemini 2.5 Flash-Lite | $0.000170 | $5.10 |
| GPT-5.4 nano | $0.000475 | $14.25 |
| Claude Haiku 4.5 | $0.002000 | $60.00 |
| GPT-5.4 mini | $0.001725 | $51.75 |
| Claude Sonnet 4.6 | $0.006000 | $180.00 |
| GPT-5.4 | $0.005750 | $172.50 |
| Model | Cost per Request | Monthly Cost |
|---|---|---|
| Gemini 2.5 Flash-Lite | $0.000900 | $2.70 |
| Gemini 3.1 Pro | $0.022000 | $66.00 |
| Claude Sonnet 4.6 | $0.030000 | $90.00 |
| GPT-5.4 | $0.027500 | $82.50 |
| Claude Opus 4.7 | $0.050000 | $150.00 |
All three major providers offer prompt caching — if the same system prompt, document context, or conversation history is reused across requests, cached tokens are charged at a fraction of the standard rate. Anthropic charges 10% of standard input price for cache hits. OpenAI and Google offer similar discounts. For applications with consistent system prompts, this alone can cut input costs by 50–80%.
All providers offer a Batch API that processes requests asynchronously (typically within 24 hours) at 50% off standard rates. For non-real-time workloads — document processing, data enrichment, offline evaluations — batch mode is a no-brainer.
Not every request needs your most capable model. Route simple classification, extraction, or FAQ responses to a budget model (Gemini 2.5 Flash-Lite, GPT-5.4 nano) and reserve flagship models for complex reasoning, nuanced writing, or multi-step tasks. This "model cascade" approach can reduce costs by 60–80% without meaningful quality loss on simpler tasks.
Since output tokens are 3–5x more expensive, keeping responses concise is valuable. Setting max_tokens limits, using structured output formats (JSON), and being specific in your prompts about desired response length all reduce costs.
Rule of thumb: A 1,000-token response costs 3–5x what a 1,000-token prompt costs. Writing prompts that elicit shorter but sufficient responses saves money on every request.
Calculate your exact monthly AI API costs for any usage pattern.
Use the Free API Cost Calculator →