The AI model you pick can cost 50 times more than a functionally equivalent alternative for the same task. That's not an exaggeration — the range from the cheapest to most expensive production-grade models spans two orders of magnitude per token. For a company processing millions of requests monthly, this difference can be six figures per year.
This guide breaks down the actual per-token pricing for every major model in 2026, shows what that means in real monthly cost estimates, and helps you think through how to pick the right model for your use case.
How AI API Pricing Works
All major AI APIs charge separately for input tokens (your prompt) and output tokens (the model's response), billed per million tokens (abbreviated MTok or 1M).
A token is roughly 4 characters of English text, or about 0.75 words. A 500-word prompt is approximately 660 tokens. A 200-word response is about 265 tokens.
Output tokens are almost always more expensive than input tokens — typically 3–10× higher — because generating text requires more computation than reading it.
Current Pricing: All Major Models (April 2026)
| Model | Provider | Input $/1M | Output $/1M | Context Window |
|---|---|---|---|---|
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 1M tokens |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 200k tokens |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200k tokens |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 270k tokens |
| GPT-5.4 mini | OpenAI | $0.75 | $4.50 | 270k tokens |
| GPT-5.4 nano | OpenAI | $0.20 | $1.25 | 270k tokens |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M tokens |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M tokens | |
| Gemini 3 Flash | $0.50 | $3.00 | 1M tokens | |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M tokens |
Prices as of April 2026. Always verify current rates at provider websites before making production decisions.
Real Monthly Cost Examples
Let's make these numbers concrete. Suppose you're running a chatbot that handles 1,000 requests/day with an average of 500 input tokens and 300 output tokens per request.
| Model | Cost/Request | Daily Cost | Monthly Cost |
|---|---|---|---|
| Claude Opus 4.7 | $0.0100 | $10.00 | $300 |
| GPT-5.4 | $0.0058 | $5.75 | $173 |
| Claude Sonnet 4.6 | $0.0060 | $6.00 | $180 |
| Gemini 3.1 Pro | $0.0046 | $4.60 | $138 |
| GPT-5.4 mini | $0.0051 | $5.10 | $153 |
| Claude Haiku 4.5 | $0.0020 | $2.00 | $60 |
| Gemini 2.5 Flash-Lite | $0.00017 | $0.17 | $5 |
At 1,000 requests/day, the difference between Claude Opus 4.7 ($300/month) and Gemini 2.5 Flash-Lite ($5/month) is $295/month — nearly $3,500/year. At 100,000 requests/day, that gap is $350,000/year.
How to Choose the Right Model
The right model is the cheapest one that reliably produces acceptable results for your specific task. This requires:
- Testing on your actual prompts — benchmarks don't reflect real-world task performance
- Defining "acceptable" — what quality bar do your users actually need?
- Cascading architecture — route simple tasks to cheap models, complex ones to expensive models
Task-Based Recommendations
- Simple classification / extraction / routing: GPT-5.4 nano, Gemini 2.5 Flash-Lite ($0.10–0.20/MTok input)
- Customer support / FAQ / summarization: Claude Haiku 4.5, Gemini 3 Flash, GPT-5.4 mini
- Code generation / reasoning / analysis: Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro
- Most demanding agentic / complex tasks: Claude Opus 4.7 (highest quality ceiling)
- Long document processing (>200k tokens): Claude Opus 4.7, GPT-4.1, Gemini 3.1 Pro (1M context)
Cost Reduction Strategies
- Prompt caching: Most providers offer 75–90% discounts on cached input tokens. If your system prompt is consistent across requests, caching can cut input costs dramatically.
- Batch API: All major providers offer 50% discounts for async batch processing. For non-real-time workloads, this halves your bill immediately.
- Output optimization: Instruct models to be concise. Output tokens cost 3–10× more than input tokens. Reducing average response length from 500 to 300 tokens can cut your bill by 30–40%.
- Model routing: Classify request complexity and route accordingly. A classifier using a $0.10/MTok model to route between cheap and expensive models often pays for itself quickly.
AI API pricing has dropped dramatically since 2023 and continues to fall. The prices in this article reflect April 2026 rates. Always check official provider pricing pages before making production commitments, and consider setting up price alerts or reviewing costs quarterly.