Orchestrator Agent — Top Tier

100
Orchestrator

Sub-Agents — Worker Tier

3
3
Sub-Agents

Options

Context accumulation ON
When checked, each sub-agent call includes the cumulative output from all prior sub-agents in that pipeline run. Sub-agent 2 receives sub-agent 1's output as extra input; sub-agent 3 receives both, etc. This is how real pipelines work and increases sub-agent input costs.
Cost per pipeline run
orchestrator + all sub-agents
Daily cost
Monthly cost
×30 days
Cost vs single-agent
vs running all on orchestrator model

Per-tier cost breakdown

Tier Model Calls/day Cost/day Cost/month
💡

Model routing strategy — where the real savings come from

The cost advantage of multi-agent pipelines isn't from splitting work — it's from running different tiers on different models. Your orchestrator makes relatively few, high-value calls (it just needs to decompose and route). Your sub-agents do the volume. Run your orchestrator on a mid-tier model (Sonnet 4.6 / GPT-5.4) and route bulk sub-tasks to cheap models (Haiku 4.5 / Gemini 3 Flash / Gemini 2.5 Flash-Lite). Most pipelines see 60–80% cost reduction vs running everything on a frontier model — without any quality loss on the heavy-lifting tasks, where the cheap models spend all their time.

Model comparison — what would this pipeline cost on all-[X]?

Model Provider Per run Per day Per month

Prices as of May 2026. Verify at provider pricing pages before budgeting at scale. Rates shown are standard API prices; batch discounts, prompt caching, free-tier credits, and enterprise agreements may lower your actual costs further.

How the Multi-Agent Orchestration Cost Calculator Works

In a multi-agent pipeline, every tier makes its own API calls. The orchestrator receives the user's task, calls the LLM once to produce routing instructions, then each sub-agent call is a separate API request.

Orchestrator cost/run = (orchSysTokens + orchInputTokens) × inputRate + orchOutputTokens × outputRate

Sub-agent costs depend on the accumulation setting. With accumulation OFF, every sub-agent call is identical:

Sub-agent call cost = (subInputTokens) × subInputRate + subOutputTokens × subOutputRate

With accumulation ON, each successive sub-agent in a pipeline run also receives the output of all prior sub-agents, growing the input:

Sub-agent N input = subInputTokens + (N − 1) × subOutputTokens

Total sub-agent cost per run sums across all sub-agent calls: subTasksPerMain × (average sub-agent call cost). Daily and monthly costs scale by tasksPerDay × 30.

The Cost vs single-agent stat compares your pipeline cost against the hypothetical of running everything on the orchestrator model alone — using the same total token budget (orchestrator tokens + all sub-agent tokens on the same model).

Frequently Asked Questions

What is multi-agent orchestration and why does it cost more?

Multi-agent orchestration is an architecture where a lead "orchestrator" agent breaks down a task and delegates subtasks to specialist "sub-agents". Each tier makes its own API calls, so total cost = orchestrator calls + (sub-agents × sub-tasks per run) × call cost. A pipeline that looks like one task is actually many API calls under the hood. With context accumulation between sub-agents, costs can grow 2–5x beyond a naive estimate. The upside: you can route cheap models to the bulk work and expensive models only to complex reasoning — making the total cost lower than running everything on a frontier model.

Should my orchestrator use a smarter or cheaper model?

It depends on what your orchestrator does. If it only routes tasks and writes brief instructions, a mid-tier model (GPT-5.4 mini, Claude Haiku) is fine and saves significant money. If it needs complex reasoning to decompose ambiguous tasks, a frontier model (Claude Sonnet 4.6, GPT-5.4) pays for itself in quality. A common pattern: use Sonnet/GPT-5.4 for orchestration and Haiku/Flash for bulk sub-agent work — this captures 80% of quality at 30–40% of the all-frontier cost.

How does context accumulation affect sub-agent costs?

When context accumulation is ON, each sub-agent call in a pipeline run includes the output of all prior sub-agents as additional input context. Sub-agent 1 sees only its own instructions; Sub-agent 2 sees instructions + Sub-agent 1's output; Sub-agent 3 sees everything plus Sub-agent 2's output, and so on. For a pipeline with 5 sub-agents each producing 400 output tokens, the last sub-agent's input can be 1,600 tokens heavier than the first's. Across thousands of daily runs, that accumulation drives significant extra cost. Mitigation: summarize intermediate outputs instead of passing raw text, or use a retrieval step to pass only relevant fragments.

What's the cheapest viable multi-agent setup?

The cheapest production-quality setup is typically Gemini 2.5 Flash-Lite or DeepSeek V3 for sub-agents ($0.10–$0.27/1M input) with a budget orchestrator like Hermes 3 70B or GPT-5.4 nano. For more reliable orchestration with cheap bulk work, try GPT-5.4 mini as orchestrator + Gemini 2.5 Flash-Lite sub-agents — this "Budget Pipeline" preset can be 10–20x cheaper than all-Claude Sonnet configurations while remaining high quality for structured tasks like data extraction, classification, or summarization.

How does this compare to Claude's Managed Agents orchestration feature?

Claude's Managed Agents (available through the Anthropic API) provides built-in orchestration primitives — the orchestration logic runs inside Anthropic's infrastructure, meaning you pay for the token usage of each Claude model involved but don't build your own routing layer. The token costs are identical to what this calculator shows; what Managed Agents saves you is engineering time. Use this calculator to estimate your token budget before committing to any orchestration framework — whether that's Anthropic's Managed Agents, LangGraph, CrewAI, AutoGen, or a custom implementation. The model choice matters far more than the framework.