Your Usage Parameters

Claude Sonnet 4.6 Input: — Output: —

Cost for selected model

Per Request
input + output tokens
Daily Cost
at — req/day
Monthly Cost
— days × — req/day

Full model comparison — same usage parameters

Model Provider Input $/1M Output $/1M Per Request Daily Monthly

Prices last updated: April 2026. Pricing may change — always verify current rates at anthropic.com/pricing, openai.com/pricing, and ai.google.dev/pricing. Prices shown are standard API rates and do not include batch discounts, enterprise agreements, or free-tier credits.

Token-based pricing: All major AI APIs charge separately for input tokens (your prompt) and output tokens (the model's response). Output tokens are typically 3–5× more expensive than input tokens.

Formula: Cost per request = (input_tokens ÷ 1,000,000 × input_price) + (output_tokens ÷ 1,000,000 × output_price). Then multiply by daily requests and days per month for totals.

What counts as a token? Roughly 4 characters of English text equals 1 token. A 500-word essay is approximately 650–700 tokens. System prompts, conversation history, and tool/function definitions all count toward input tokens.

Hidden costs to consider: Context caching (some models charge for cached tokens), fine-tuning, embeddings, image/audio inputs, and rate-limit overages are not included in this calculator. Always review your provider's full pricing page.

Reference Pricing — All Major Models

Updated: May 2026
Model Provider Input $/1M Output $/1M Context Window Best For
Claude Opus 4.7 Anthropic $5.00 $25.00 1M tokens Complex reasoning, agentic
Claude Sonnet 4.6 Anthropic $3.00 $15.00 200k tokens Coding, analysis, writing
Claude Haiku 4.5 Anthropic $1.00 $5.00 200k tokens Best value: classification, chat
GPT-5.4 OpenAI $2.50 $15.00 270k tokens Reasoning, broad tasks
GPT-5.4 mini OpenAI $0.75 $4.50 270k tokens High-volume, cost-sensitive
GPT-5.4 nano OpenAI $0.20 $1.25 270k tokens Best value: routing, tagging
GPT-4.1 OpenAI $2.00 $8.00 1M tokens Long docs, instruction follow
Gemini 3.1 Pro Google $2.00 $12.00 1M tokens Multimodal, long context
Gemini 3 Flash Google $0.50 $3.00 1M tokens Balanced speed & quality
Gemini 2.5 Flash-Lite Google $0.10 $0.40 1M tokens Best value: ultra-high volume

Rates as of May 2026. Verify at anthropic.com/pricing, openai.com/pricing, ai.google.dev/pricing.

How the AI API Cost Calculator Works

AI APIs charge per token - roughly 0.75 words, or about 4 characters of text. Every request has input tokens (your prompt and context) and output tokens (the model's response). These are priced separately, and input is almost always cheaper than output.

Cost = (Input Tokens / 1,000,000 x Input Price) + (Output Tokens / 1,000,000 x Output Price)

Worked example: 10,000 requests per day with 1,000 input tokens and 500 output tokens each using Claude Sonnet 4 at $3/M input and $15/M output: Daily input cost = 10M tokens x $3 = $30. Daily output cost = 5M tokens x $15 = $75. Total: $105/day or approximately $3,150/month.

Prompt caching (available on Claude and GPT-4o) dramatically reduces input costs for repeated context. Cached tokens cost 80-90% less. For RAG applications with long system prompts, enabling caching can cut your monthly bill by 60% or more.

Model prices shown reflect current published rates as of May 2026. Prices change frequently - verify against each provider's official pricing page before budgeting.

Frequently Asked Questions

What is a token in AI models?

A token is the basic unit of text that AI models process. One token is roughly 0.75 words in English - "calculator" is one token, "calculatorapp.io" is two or three. A typical paragraph of 100 words is about 130-140 tokens. The exact tokenization varies by model: OpenAI uses tiktoken, Anthropic uses its own tokenizer, and Google uses SentencePiece.

Why are output tokens more expensive than input tokens?

Output tokens require the model to generate text sequentially, which is computationally intensive and cannot be parallelized. Input tokens are processed in parallel, making them faster and cheaper to compute. For most models, output tokens cost 3-5x more than input tokens per million. This is why concise prompts that minimize output length can meaningfully reduce costs.

How does prompt caching reduce costs?

Prompt caching stores frequently repeated content such as a long system prompt or document so it does not need to be reprocessed on every request. Claude's cache hits cost 90% less than standard input tokens. If your app sends a 10,000-token system prompt with every request, enabling caching turns that from your biggest cost driver into a minor line item.

Which model is cheapest for my use case?

For simple high-volume tasks like classification, summarization, or extraction: use smaller models like Claude Haiku, GPT-4o mini, or Gemini Flash. They are 10-20x cheaper than frontier models with comparable quality on simpler tasks. For complex reasoning, coding, or nuanced generation: frontier models justify the higher cost. Use this calculator to compare costs at your specific request volume.

LLM API Pricing Comparison (2025 per 1M Tokens)

LLM API pricing is denominated per million tokens, where one token is roughly 0.75 words in English. The key variables in any cost estimate are the ratio of input to output tokens in your workload (output is always more expensive), the model tier you select, and whether your architecture benefits from prompt caching. Prices below reflect published rates as of May 2025 and change frequently — always verify against provider pricing pages.

Context window size matters for workloads with long documents or extended conversations. Models with 1M token context windows (Gemini) can process entire codebases in a single request; the tradeoff is that longer contexts cost proportionally more in input token fees.

ModelInput Cost (per 1M)Output Cost (per 1M)Context WindowBest For
GPT-4o$2.50$10.00128k tokensGeneral purpose
GPT-4o mini$0.15$0.60128k tokensCost-efficient tasks
Claude 3.5 Sonnet$3.00$15.00200k tokensLong docs/coding
Claude 3 Haiku$0.25$1.25200k tokensFast/cheap tasks
Gemini 1.5 Pro$1.25$5.001M tokensHuge context needs
Gemini 1.5 Flash$0.075$0.301M tokensHigh-volume workloads
Llama 3 70B (self-hosted)~$0.50–$1.00~$0.50–$1.008k tokensPrivacy/cost control

Worked Examples

Example 1 — Customer support bot on GPT-4o mini
A customer support bot processes 10,000 tickets/month. Each ticket involves 500 input tokens (conversation history + question) and 300 output tokens (response). Total monthly tokens: 5,000,000 input + 3,000,000 output. Using GPT-4o mini: input cost = 5M × $0.15/1M = $0.75; output cost = 3M × $0.60/1M = $1.80. Total: $2.55/month for 10,000 tickets = $0.000255/ticket. At that cost, even a $1/month subscription price per user produces a 99.97% gross margin on API costs alone.
Example 2 — Same workload on GPT-4o
Running the identical 10,000 ticket/month workload on GPT-4o instead of GPT-4o mini: input cost = 5M × $2.50/1M = $12.50; output cost = 3M × $10.00/1M = $30.00. Total: $42.50/month — 16.7× more expensive than GPT-4o mini. For simple support queries where answer quality is comparable between models, this difference is hard to justify. Reserve GPT-4o for complex reasoning, nuanced judgment, or tasks where quality measurably impacts outcomes.

Frequently Asked Questions

What is a token in LLM APIs?

A token is the smallest unit of text an LLM processes. In English, one token is roughly 0.75 words — so "calculator" is one token, but "extraordinary" might be two. Punctuation, spaces, and special characters each consume tokens too. Most providers offer free tokenizer tools: OpenAI's tiktoken playground and Anthropic's token counter let you paste text and see exactly how many tokens it will use before you send a request. For cost estimation, a rule of thumb of 1.3 tokens per word (or 750 words per 1,000 tokens) is accurate enough for budgeting.

How many tokens is a typical message?

A short question like "What is the capital of France?" is about 10 tokens. A typical email is 150–250 tokens. A one-page document is roughly 500–700 tokens. A 10-page PDF is approximately 3,000–5,000 tokens. A full novel is 100,000–150,000 tokens, which exceeds most models' context windows. For API cost estimation, model your average input as: system prompt tokens + conversation history tokens + current message tokens. Output is usually shorter than input for most task types except creative writing.

How do I reduce LLM API costs?

Six high-impact strategies: (1) Use a cheaper model — switching from GPT-4o to GPT-4o mini saves 94% on a typical workload. (2) Enable prompt caching — 90% discount on repeated system prompt tokens in Claude, 50% in GPT-4o. (3) Shorten system prompts — every token in your system prompt is billed on every request. (4) Set output token limits — use max_tokens to cap response length. (5) Cache responses for repeated queries. (6) Batch non-urgent requests — OpenAI's Batch API offers 50% cost reduction for async workloads. Together these strategies can reduce costs by 80–95% without changing models.

What is the difference between input and output token pricing?

Output tokens cost more than input tokens because generating text is computationally sequential — the model must produce one token at a time, which cannot be parallelized. Input tokens are processed in parallel, requiring less compute time per token. Most models price output at 3–6× the input rate. This means applications that generate long responses (creative writing, detailed analysis, code generation) have higher per-request costs than those that classify, extract, or give short answers. Designing prompts to elicit concise responses is one of the most effective cost reduction techniques.

How does context window size affect cost?

Every token in the context window — including system prompt, conversation history, and retrieved documents — is billed as input tokens on each request. A 10,000-token system prompt sent with 1,000 daily requests costs $25/day on GPT-4o (without caching) vs. $2.50/day with prompt caching enabled. Larger context windows enable more powerful applications (whole-document analysis, long conversations) but drive up costs if not managed carefully. Best practices: use the minimum context needed for the task, summarize older conversation history, and always enable prompt caching for any repeated context over ~500 tokens.