Last updated: May 2026
Calculate the true cost of running multi-step AI agents at scale — with realistic context accumulation, step-by-step breakdown, and model comparison.
Agent Configuration
Step-by-step cost breakdown
| Step | Input tokens | Output tokens | Step cost | Cumulative |
|---|
Model comparison — same agent configuration
| Model | Provider | Per run | Per day | Per month | Per year |
|---|
Prices last updated: May 2026. Always verify at openai.com/pricing, anthropic.com/pricing, and ai.google.dev/pricing. Rates are standard API prices and exclude batch discounts, caching, enterprise agreements, or free-tier credits.
Unlike a single API call, AI agents make many sequential calls to complete a task. Each step costs input tokens (the system prompt + prior context + new input) plus output tokens (the model's response). The key difference from a simple API calculator is context accumulation: in realistic mode, each step carries all previous outputs as additional input context.
In independent mode (accumulation OFF), every step is priced identically: system_prompt + input_tokens_per_step input, output_tokens_per_step output. This underestimates real agent costs.
Worked example: GPT-4o, 5-step agent, 500 system prompt tokens, 300 input tokens/step, 500 output tokens/step, accumulation ON. Step 1 input = 500 + 300 + 0 = 800 tokens. Step 3 input = 500 + 300 + 1000 = 1,800 tokens. Step 5 input = 500 + 300 + 2000 = 2,800 tokens. Total input across 5 steps = 800+1300+1800+2300+2800 = 9,000 tokens. Total output = 5 × 500 = 2,500 tokens. At GPT-4o rates ($2.50/$10 per 1M): cost per run = $0.0000225 + $0.000025 = ~$0.048. At 10 runs/day that's $0.48/day or ~$14.40/month.
AI agents make multiple sequential API calls to complete a task — each step incurs input and output token costs. With context accumulation (realistic mode), each step also carries the full conversation history as input, so token counts grow with every step. A 10-step agent run can cost 5–15x more than a single-turn call with the same model, even if the total task is the same length of work.
Context accumulation means each step in an agent run includes all previous steps' outputs in its input context. Step 1 sees only the system prompt + initial input; Step 2 sees all of that plus Step 1's output; and so on. This is how real LLM-based agents actually work — the model needs prior context to reason about next steps. It dramatically increases input token costs for longer agents, especially on expensive models. A 10-step agent with accumulation ON can have 4–5x the input tokens of a 10-step agent with accumulation OFF.
For high-volume, many-step agents, cheaper models like GPT-4o mini ($0.15/$0.60 per 1M tokens) or Gemini 1.5 Flash ($0.075/$0.30 per 1M tokens) can be 10–100x cheaper than frontier models like Claude 3 Opus. A common strategy is "model routing" — use a cheap model for routine steps and a more powerful model only for complex reasoning steps, significantly reducing overall agent costs without sacrificing quality where it matters.
Key strategies: (1) Reduce steps per run — every extra step multiplies cost super-linearly with accumulation. (2) Summarize context instead of passing full history — prevents unbounded input growth. (3) Use cheaper models for simple sub-tasks. (4) Enable prompt caching for repeated system prompts — saves up to 90% on cached tokens with Claude, 50% with GPT-4o. (5) Keep output tokens short by prompting for concise responses. (6) Batch non-urgent agent jobs using provider batch APIs for 50% discounts. These strategies together can cut costs by 80–95%.
The system prompt defines the agent's role, instructions, and available tools. It is sent with every step of every run. A 1,000-token system prompt sent across 5 steps and 100 runs/day equals 500,000 input tokens per day — just from the system prompt alone. Keeping system prompts concise, or using prompt caching to dramatically reduce their per-call cost, is one of the highest-leverage cost optimizations for agents. As a rule: every extra 100 tokens in your system prompt costs you (steps × runs/day) extra input tokens every single day.