Why does an agentic loop cost so much more than a single LLM call?

In an agentic loop, every step re-reads the entire conversation history: the system prompt, tool definitions, all previous user messages, and all previous agent responses. A 10-step agent at step 10 processes roughly 10x the tokens of step 1 as input context. Summing across all steps, a 10-step loop can cost 55x more than one isolated step — because context accumulates quadratically as steps increase.

What is context accumulation in AI agents?

Context accumulation is the pattern where each new step in an agentic loop receives all prior messages in its input. Step 1 sees only the system prompt and first user message. Step 2 sees step 1's output plus a new message. Step N sees N-1 assistant turns plus N user/tool turns plus the constant system prompt and tool definitions. This means input token counts grow roughly linearly per step, and total input cost across all steps grows quadratically with the number of steps.

How does prompt caching reduce agentic loop costs?

Prompt caching (available on Claude and some other models) allows the model provider to cache the KV representation of a prefix of your prompt. For agentic loops, the system prompt and tool definitions are constant across all steps — caching them saves roughly 90% of those tokens' cost on every step after the first. With a large system prompt (1,000+ tokens), prompt caching can reduce total loop cost by 20–40% depending on how many steps the loop runs.

Which AI model is cheapest for running agentic loops?

For cost-sensitive agentic workloads, Gemini 3 Flash ($0.075/$0.30 per 1M tokens in/out) and GPT-5.4 mini ($0.15/$0.60 per 1M tokens) are the most affordable options. Claude Haiku 4.5 ($0.80/$4.00 per 1M) offers strong reasoning at moderate cost. For high-stakes agents requiring maximum capability, Claude Sonnet 4.6 and GPT-5.4 are common choices despite higher per-token costs. The best approach is to use a capable but affordable model for most steps and escalate to a frontier model only when needed.

How can I reduce the cost of my agentic loops?

Five high-impact strategies: (1) Reduce the number of steps — consolidate tool calls and minimize back-and-forth. (2) Enable prompt caching for your system prompt and tool definitions. (3) Trim output verbosity — shorter agent responses mean less context carried forward to future steps. (4) Use a cheaper model — a 10-step loop on Gemini Flash costs less than a 2-step loop on Claude Opus. (5) Summarize or prune history — instead of passing all prior messages, summarize completed sub-tasks to reduce context size at later steps.

Agentic Loop Cost Estimator

Last updated: May 2026

Multi-step AI agents don't cost 10× one step — they cost ~55× because context accumulates. See the real math, per-step breakdown, and what prompt caching saves.

💡

Why agentic loops are expensive — and non-obvious

Think of it like a meeting transcript. Every step, the AI re-reads everything that happened before. Step 8 reads 7 steps of history before responding. This means input tokens grow with every turn — and total cost grows roughly quadratically with the number of steps.

A 10-step agent: Step 1 processes ~1,400 tokens of input. Step 10 processes ~7,600 tokens of input. Sum it up and the loop consumes N×(N+1)/2 times the per-step input — about 55× for 10 steps. This calculator shows that in real dollars.

Model & Pricing

LLM Model

Context Configuration

Tokens that are constant every step vs. tokens that grow with the loop

System Prompt Tokens(constant, every step)

Tool Definitions Tokens(constant, every step)

Input Tokens per Step(new user/tool result each step)

Output Tokens per Step(agent response — grows context forward)

Volume

Number of Steps 8

1102030

Loops per Day(agent runs per day)

Results

Total Tokens / Loop

—

input + output

Cost per Loop

—

full loop cost

Cost per Day

—

× loops/day

Cost per Month

—

30 days

Cost per step — watch it grow

Step	Input Tokens	Output Tokens	Step Cost	Cumulative Cost

Prompt Caching Savings

Assumes 90% cache hit rate; cached tokens cost 10% of normal input price

Cached Tokens/Step

—

sys + tool defs

Savings per Loop

—

with caching

Savings per Month

—

30 days

Cached Cost/Loop

—

after savings

Scenario	Cost per Loop	Cost per Day	Cost per Month	Savings vs. No Cache

Model Comparison — Same Loop Configuration

Model	Input Price	Output Price	Cost / Loop	Cost / Day	Cost / Month

Pricing based on publicly available API rates as of May 2026 and may change. Prompt caching availability and exact discount rates vary by provider. Use for estimation purposes only.

How Agentic Loop Costs Are Calculated

At each step N of an agentic loop, the model receives the entire conversation history as input context:

Input tokens at step N = system_prompt + tool_defs + (N−1) × output_per_step + N × input_per_step

Output tokens are the same every step. So total input across the full loop is the sum from N=1 to N=steps:

Total input = steps × (system_prompt + tool_defs) + output_per_step × (0+1+...+(steps−1)) + input_per_step × (1+2+...+steps) = steps × constant + output_per_step × steps×(steps−1)/2 + input_per_step × steps×(steps+1)/2

The quadratic term (steps² / 2) is what makes long agentic loops expensive. Doubling the number of steps roughly quadruples the input token cost — not doubles it.

Prompt caching lets you cache the KV state of the system prompt and tool definitions (the constant prefix). With a 90% cache hit rate and 10% cached price, savings at each step = 0.9 × 0.9 × (sys_prompt + tool_defs) × input_price_per_token.

Frequently Asked Questions

Why does a 10-step agent cost 55× more than one step, not 10×?

At step 1, the model reads only the system prompt + tool definitions + first user message. At step 10, it reads all of that plus 9 prior assistant responses and 9 additional user/tool messages. Summing input tokens from step 1 to step 10, you get roughly 1 + 2 + 3 + ... + 10 = 55 times the "unit" of per-step context. Output tokens are constant (one response per step), but input dominates cost for most models, especially at long context. This is called the N×(N+1)/2 accumulation pattern.

What does prompt caching actually save in practice?

Prompt caching saves the most when your system prompt and tool definitions are large (1,000+ tokens) and your loop has many steps. For a loop with a 1,200-token constant prefix (system prompt + tools), 10 steps, and Claude Sonnet 4.6 pricing: uncached cost for those constant tokens = 10 × 1,200 × $3.00/1M = $0.000036. With 90% cache hit at 10% price: cached cost = 10 × 1,200 × $3.00/1M × (0.1 × 0.9 + 0.1) = $0.0000072. That's an 80% reduction on the constant prefix portion of your loop cost.

How can I reduce the number of tokens my agent outputs?

Agent verbosity directly controls future input cost — every token the model outputs at step N becomes part of the input at step N+1. Strategies: (1) Add "be concise" instructions to your system prompt. (2) Use structured output (JSON) instead of prose — structured formats are typically 30–50% shorter. (3) Summarize completed sub-tasks rather than keeping full trace in context. (4) Use a "working memory" pattern: the agent maintains a compact state object rather than full conversational history.

Is it cheaper to run many short loops or one long loop?

Almost always cheaper to run shorter loops. A single 20-step loop has total input proportional to 20×21/2 = 210 "steps of context." Two 10-step loops have total input proportional to 2 × 10×11/2 = 110. So splitting a 20-step loop into two 10-step loops cuts input token cost by roughly 48%. The tradeoff is that splitting requires careful state handoff between loops, which adds engineering complexity.

Which models support prompt caching for agentic workloads?

As of mid-2026: Anthropic Claude 3.x and Claude 4.x series support prompt caching with explicit cache control headers — you mark the prefix to cache, and subsequent calls hitting that prefix are billed at 10% of normal input price. OpenAI GPT-5.4 models have automatic prompt caching that triggers for prompts over 1,024 tokens with a 50% discount on cached tokens. Google Gemini 1.5 and 2.0 models support "context caching" for prefixes over 32k tokens (minimum cache size). For short system prompts under 1k tokens, Anthropic's explicit caching is the most accessible option.

The Context Accumulation Problem

Most developers are surprised by agentic loop costs because they reason about them linearly: "10 steps = 10 API calls, so 10× the cost of one call." But this misses how context windows work. Each call in a loop carries forward all prior messages, so the input size — and therefore input cost — grows with every step.

The classic example: you build a research agent that calls 8 tools in sequence. You benchmark the cost of a single tool call ($0.002) and estimate your 8-step loop at $0.016. The real cost is closer to $0.072 — because step 8 processes 7 prior assistant responses plus 7 prior tool results as input context, not just one tool call. Steps 1–7 also accumulated context, making the total much higher than expected.

Worked example: Code review agent, 8 steps
System prompt: 800 tokens | Tool defs: 400 tokens | Input/step: 200 tokens | Output/step: 400 tokens | Model: Claude Sonnet 4.6 ($3/$15 per 1M)

Step 1 input: 800 + 400 + 200 = 1,400 tokens → $0.0000042
Step 4 input: 800 + 400 + 3×400 + 4×200 = 3,200 tokens → $0.0000096
Step 8 input: 800 + 400 + 7×400 + 8×200 = 5,800 tokens → $0.0000174
Total input across all steps: ~26,800 tokens | Total output: 3,200 tokens
Total loop cost: ~$0.000128 — vs $0.000034 if each step ran independently

Optimization Strategies

The most impactful lever is reducing output tokens per step, because these accumulate as future input. A 50% reduction in output verbosity cuts total input token cost by roughly 25% on a 10-step loop — a bigger effect than switching from GPT-5.4 to GPT-5.4 mini on the output side alone.

Prompt caching is the second most impactful optimization when your constant prefix (system prompt + tool definitions) is large. At 1,200 tokens of constant prefix across 10 steps with Claude Sonnet 4.6, caching saves roughly $0.000032 per loop — not huge for one loop, but at 20 loops/day and 30 days/month that's ~$0.19/month per agent instance. At enterprise scale (10,000 loops/day), that's $960/month from just enabling caching.

What is the fastest way to cut agentic loop costs by 50%?

Switch to a cheaper model for non-critical steps. If your agent uses Claude Sonnet 4.6 for all 10 steps, using Claude Haiku 4.5 for steps 1–7 and Sonnet only for the final synthesis step cuts costs by roughly 70%. The key insight: most intermediate steps in a research or coding agent don't require the full capabilities of a frontier model — they're doing simple tool dispatching or formatting. Reserve the expensive model for synthesis, judgment, and final output.

How do token counts change if my agent uses function calling?

Function calling (tool use) typically adds 200–600 tokens per call to your context: the tool call JSON, the tool result, and any parsing overhead. These count as part of the accumulated context and grow with each step. If your agent makes one function call per step with a 300-token tool result, that adds to the input_per_step baseline — your effective input accumulation is higher than the raw "input per step" figure alone.