Agentic Loop Cost Estimator

Last updated: May 2026

Multi-step AI agents don't cost 10× one step — they cost ~55× because context accumulates. See the real math, per-step breakdown, and what prompt caching saves.

💡

Why agentic loops are expensive — and non-obvious

Think of it like a meeting transcript. Every step, the AI re-reads everything that happened before. Step 8 reads 7 steps of history before responding. This means input tokens grow with every turn — and total cost grows roughly quadratically with the number of steps.

A 10-step agent: Step 1 processes ~1,400 tokens of input. Step 10 processes ~7,600 tokens of input. Sum it up and the loop consumes N×(N+1)/2 times the per-step input — about 55× for 10 steps. This calculator shows that in real dollars.

Model & Pricing

Context Configuration

Tokens that are constant every step vs. tokens that grow with the loop

Volume

1102030

Results

Total Tokens / Loop
input + output
Cost per Loop
full loop cost
Cost per Day
× loops/day
Cost per Month
30 days
Cost per step — watch it grow
Step Input Tokens Output Tokens Step Cost Cumulative Cost

Prompt Caching Savings

Assumes 90% cache hit rate; cached tokens cost 10% of normal input price

Cached Tokens/Step
sys + tool defs
Savings per Loop
with caching
Savings per Month
30 days
Cached Cost/Loop
after savings
ScenarioCost per LoopCost per DayCost per MonthSavings vs. No Cache

Model Comparison — Same Loop Configuration

Model Input Price Output Price Cost / Loop Cost / Day Cost / Month

Pricing based on publicly available API rates as of May 2026 and may change. Prompt caching availability and exact discount rates vary by provider. Use for estimation purposes only.

How Agentic Loop Costs Are Calculated

At each step N of an agentic loop, the model receives the entire conversation history as input context:

Input tokens at step N = system_prompt + tool_defs + (N−1) × output_per_step + N × input_per_step

Output tokens are the same every step. So total input across the full loop is the sum from N=1 to N=steps:

Total input = steps × (system_prompt + tool_defs) + output_per_step × (0+1+...+(steps−1)) + input_per_step × (1+2+...+steps) = steps × constant + output_per_step × steps×(steps−1)/2 + input_per_step × steps×(steps+1)/2

The quadratic term (steps² / 2) is what makes long agentic loops expensive. Doubling the number of steps roughly quadruples the input token cost — not doubles it.

Prompt caching lets you cache the KV state of the system prompt and tool definitions (the constant prefix). With a 90% cache hit rate and 10% cached price, savings at each step = 0.9 × 0.9 × (sys_prompt + tool_defs) × input_price_per_token.

Frequently Asked Questions

Why does a 10-step agent cost 55× more than one step, not 10×?

At step 1, the model reads only the system prompt + tool definitions + first user message. At step 10, it reads all of that plus 9 prior assistant responses and 9 additional user/tool messages. Summing input tokens from step 1 to step 10, you get roughly 1 + 2 + 3 + ... + 10 = 55 times the "unit" of per-step context. Output tokens are constant (one response per step), but input dominates cost for most models, especially at long context. This is called the N×(N+1)/2 accumulation pattern.

What does prompt caching actually save in practice?

Prompt caching saves the most when your system prompt and tool definitions are large (1,000+ tokens) and your loop has many steps. For a loop with a 1,200-token constant prefix (system prompt + tools), 10 steps, and Claude 3.5 Sonnet pricing: uncached cost for those constant tokens = 10 × 1,200 × $3.00/1M = $0.000036. With 90% cache hit at 10% price: cached cost = 10 × 1,200 × $3.00/1M × (0.1 × 0.9 + 0.1) = $0.0000072. That's an 80% reduction on the constant prefix portion of your loop cost.

How can I reduce the number of tokens my agent outputs?

Agent verbosity directly controls future input cost — every token the model outputs at step N becomes part of the input at step N+1. Strategies: (1) Add "be concise" instructions to your system prompt. (2) Use structured output (JSON) instead of prose — structured formats are typically 30–50% shorter. (3) Summarize completed sub-tasks rather than keeping full trace in context. (4) Use a "working memory" pattern: the agent maintains a compact state object rather than full conversational history.

Is it cheaper to run many short loops or one long loop?

Almost always cheaper to run shorter loops. A single 20-step loop has total input proportional to 20×21/2 = 210 "steps of context." Two 10-step loops have total input proportional to 2 × 10×11/2 = 110. So splitting a 20-step loop into two 10-step loops cuts input token cost by roughly 48%. The tradeoff is that splitting requires careful state handoff between loops, which adds engineering complexity.

Which models support prompt caching for agentic workloads?

As of mid-2026: Anthropic Claude 3.x and Claude 4.x series support prompt caching with explicit cache control headers — you mark the prefix to cache, and subsequent calls hitting that prefix are billed at 10% of normal input price. OpenAI GPT-4o models have automatic prompt caching that triggers for prompts over 1,024 tokens with a 50% discount on cached tokens. Google Gemini 1.5 and 2.0 models support "context caching" for prefixes over 32k tokens (minimum cache size). For short system prompts under 1k tokens, Anthropic's explicit caching is the most accessible option.

The Context Accumulation Problem

Most developers are surprised by agentic loop costs because they reason about them linearly: "10 steps = 10 API calls, so 10× the cost of one call." But this misses how context windows work. Each call in a loop carries forward all prior messages, so the input size — and therefore input cost — grows with every step.

The classic example: you build a research agent that calls 8 tools in sequence. You benchmark the cost of a single tool call ($0.002) and estimate your 8-step loop at $0.016. The real cost is closer to $0.072 — because step 8 processes 7 prior assistant responses plus 7 prior tool results as input context, not just one tool call. Steps 1–7 also accumulated context, making the total much higher than expected.

Worked example: Code review agent, 8 steps
System prompt: 800 tokens | Tool defs: 400 tokens | Input/step: 200 tokens | Output/step: 400 tokens | Model: Claude 3.5 Sonnet ($3/$15 per 1M)

Step 1 input: 800 + 400 + 200 = 1,400 tokens → $0.0000042
Step 4 input: 800 + 400 + 3×400 + 4×200 = 3,200 tokens → $0.0000096
Step 8 input: 800 + 400 + 7×400 + 8×200 = 5,800 tokens → $0.0000174
Total input across all steps: ~26,800 tokens | Total output: 3,200 tokens
Total loop cost: ~$0.000128 — vs $0.000034 if each step ran independently

Optimization Strategies

The most impactful lever is reducing output tokens per step, because these accumulate as future input. A 50% reduction in output verbosity cuts total input token cost by roughly 25% on a 10-step loop — a bigger effect than switching from GPT-4o to GPT-4o mini on the output side alone.

Prompt caching is the second most impactful optimization when your constant prefix (system prompt + tool definitions) is large. At 1,200 tokens of constant prefix across 10 steps with Claude 3.5 Sonnet, caching saves roughly $0.000032 per loop — not huge for one loop, but at 20 loops/day and 30 days/month that's ~$0.19/month per agent instance. At enterprise scale (10,000 loops/day), that's $960/month from just enabling caching.

What is the fastest way to cut agentic loop costs by 50%?

Switch to a cheaper model for non-critical steps. If your agent uses Claude 3.5 Sonnet for all 10 steps, using Claude 3.5 Haiku for steps 1–7 and Sonnet only for the final synthesis step cuts costs by roughly 70%. The key insight: most intermediate steps in a research or coding agent don't require the full capabilities of a frontier model — they're doing simple tool dispatching or formatting. Reserve the expensive model for synthesis, judgment, and final output.

How do token counts change if my agent uses function calling?

Function calling (tool use) typically adds 200–600 tokens per call to your context: the tool call JSON, the tool result, and any parsing overhead. These count as part of the accumulated context and grow with each step. If your agent makes one function call per step with a 300-token tool result, that adds to the input_per_step baseline — your effective input accumulation is higher than the raw "input per step" figure alone.