LLM Margin Calculator

Last updated: May 2026

Calculate the gross margin of your AI-powered product. See your cost per query, margin per user, recommended price points, and the exact volume needed to hit your target margin.

LLM Cost Inputs

$
$

Additional Costs & Revenue

$
$
$
%

Margin Results

LLM Cost / Query
API cost only
Total Cost / Query
all-in variable cost
Gross Margin
on query revenue
Net Monthly Profit
after fixed costs
Monthly Revenue
Break-Even Volume
queries/mo to cover fixed costs

Gross margin gauge

0%25% (thin)50% (ok)75% (healthy)100%

Cost vs Revenue per query

Pricing Sensitivity Table

How margin changes at different price points for your current usage

Charge/QueryMonthly RevenueGross MarginNet Profit/LossVerdict

LLM Cost Per Query = (Input Tokens × Input Price/MTok + Output Tokens × Output Price/MTok) ÷ 1,000,000

Total Variable Cost Per Query = LLM cost + other per-query costs (infra, embeddings, DB reads, etc.)

Gross Margin = (Revenue per query − Variable cost per query) ÷ Revenue per query × 100. This is the margin before fixed costs — the pure unit economics of your AI product.

Net Monthly Profit = (Revenue per query − Variable cost per query) × monthly queries − fixed monthly costs.

Break-Even Volume = Fixed monthly costs ÷ Gross profit per query. The minimum query volume needed to cover your infrastructure and operations before turning profitable.

Target Price for Margin = Variable cost per query ÷ (1 − target margin %). For a 70% margin target: price = cost ÷ 0.30.

⚠️ LLM pricing changes frequently. Verify current rates with your provider before making pricing decisions. Gross margin shown is on query revenue only and excludes R&D, sales, and marketing costs.

How the LLM Margin Calculator Works

Building AI products on top of LLM APIs means your COGS (Cost of Goods Sold) includes API costs. This calculator computes gross margin, break-even pricing, and pricing sensitivity for AI-powered products.

Gross Margin % = (Revenue per query - API cost per query) / Revenue per query x 100 Break-even price = API cost per query / (1 - target margin %) Monthly API cost = (Input tokens x input rate + Output tokens x output rate) x monthly queries / 1,000,000

Worked example: SaaS product charging $49/month with users averaging 500 queries/month. Each query: 800 input tokens + 400 output tokens using Claude Sonnet 4 ($3/$15 per M). API cost per query = (800 x $0.000003) + (400 x $0.000015) = $0.0024 + $0.006 = $0.0084. Cost per user/month = $0.0084 x 500 = $4.20. Revenue per user: $49. Gross margin: (49 - 4.20) / 49 = 91.4%.

Frequently Asked Questions

What gross margin should an AI SaaS product target?

Traditional SaaS targets 70-85% gross margin. AI SaaS with LLM inference costs typically achieves 60-80% depending on query volume, model choice, and pricing. The key risk is usage-based cost variability — a power user consuming 10x average tokens can erode margin significantly. Best practices: (1) Set usage limits or tiers. (2) Use cheaper models for simpler tasks. (3) Build prompt caching for repeated system prompts. (4) Price per seat + usage overages rather than flat per-seat to protect margins at high usage.

How does model choice affect my product's unit economics?

The difference between frontier and economy models is dramatic. Claude Haiku costs $0.80/$4 per M tokens; Claude Sonnet 4 costs $3/$15 — approximately 4x more expensive. For a product with $5 monthly API cost per user using Sonnet, switching to Haiku for routine queries reduces that to ~$1.25/user, adding $3.75/month in gross profit per user. At 1,000 users, that is $3,750/month in additional margin — a compelling reason to implement model tiering based on task complexity.

How do I account for prompt caching in my margin calculation?

Prompt caching (available on Claude and GPT-4o) stores frequently repeated content so it is not re-billed at full price. If your application has a 2,000-token system prompt sent with every request, enabling caching reduces those tokens from full price to 10% of full price on cache hits. For Claude Sonnet 4: uncached system prompt = $0.006 per request. Cached: $0.0006. At 10,000 daily requests, that saves $54/day ($1,620/month) on system prompt costs alone.

What pricing model works best for AI products?

Three models dominate: (1) Per-seat subscription — predictable revenue, usage variability risk on costs; works well when usage is consistent. (2) Usage-based — charges scale with API costs, no margin compression; but customers dislike unpredictable bills. (3) Hybrid — base seat fee + usage overages above a generous included limit. The hybrid model is increasingly common: a flat fee covers 80% of users; heavy users pay overages that protect your margin. Analyze your user usage distribution before committing to a pricing model.

LLM Margin Benchmarks by Product Type

Gross margin in AI products is primarily driven by how central LLM inference is to the core value prop and how intensively users engage with it. A product where AI is a minor enhancement (like AI-generated email subject line suggestions) will have very different margin dynamics than an AI-first product where every user interaction runs through a frontier model.

The benchmarks below are drawn from published SaaS industry data and investor reports. Actual margins depend heavily on model selection, usage limits, caching strategy, and volume discounts negotiated with providers.

Product TypeTypical API Cost % of RevenueGross Margin TargetNotes
B2B SaaS (AI feature add-on)5–15%70–80%AI is one of many features
Consumer AI app15–35%50–70%High usage intensity
AI-first workflow tool20–40%55–75%Core value from AI
API reseller/wrapper40–60%25–45%Thin margins without differentiation
Enterprise AI platform8–20%65–80%Volume discounts help significantly
Freemium AI tool30–50% on paid tier60–75% on paid tierFree users subsidized by paid

Worked Examples

Example 1 — AI writing tool with typical usage
An AI writing tool charges $29/month per user. The average user generates 50,000 tokens/month with GPT-4o mini, split roughly 60% input / 40% output. API cost = (30,000 × $0.15 + 20,000 × $0.60) / 1,000,000 = $0.0045 + $0.012 = $0.0165/user/month. On $29 revenue, the API cost is 0.057% — nearly negligible. Even 10× that usage ($0.165) is only 0.57% of revenue, leaving substantial room for infrastructure, support, and profit.
Example 2 — Heavy user margin impact
The same tool's top 5% of users generate 2,000,000 tokens/month each. Same model: cost = (1,200,000 × $0.15 + 800,000 × $0.60) / 1,000,000 = $0.18 + $0.48 = $0.66/user/month. Still only 2.3% of $29 revenue. However, if those heavy users represent 5% of users but consume 50% of total tokens, the blended API cost per user rises to roughly $0.20–$0.40/month — still healthy at 0.7–1.4% of revenue. Moral: even heavy usage rarely breaks margin on economy models at standard SaaS price points.

Frequently Asked Questions

What is a good gross margin for an AI product?

Traditional software SaaS targets 70–85% gross margin. AI-native products with meaningful LLM inference costs typically land at 55–80% gross margin, depending on model costs and usage intensity. Anything above 60% is considered healthy for an AI-first product. Below 50% warrants a review of model selection, caching strategy, and pricing — or signals a product where margins won't support long-term unit economics. Investor benchmarks for Series A AI companies typically expect 65%+ gross margin.

How do LLM costs scale with users?

LLM costs scale roughly linearly with active users and their usage intensity — unlike traditional software infrastructure, where costs scale sublinearly due to resource pooling. This means AI products can face margin compression as they scale if usage per user grows faster than revenue. The mitigation strategies are: usage-based pricing or limits, model tiering (use cheaper models for simple tasks), prompt caching for repeated system prompts, and response caching for common queries. Plan for this non-linearity in your financial model before raising prices becomes your only option.

How do I control API costs in production?

Five high-impact levers: (1) Model tiering — route simple tasks to cheaper models automatically. (2) Prompt caching — 90% cost reduction on repeated system prompt tokens. (3) Response caching — cache common query responses for 24–48 hours. (4) Usage limits — cap tokens per user per day/month and upsell overages. (5) Output length limits — set max_tokens to prevent runaway responses. Monitoring per-user API spend with real-time alerts is essential — one misconfigured prompt with a high-traffic endpoint can generate thousands in unexpected costs overnight.

What is the difference between gross margin and contribution margin?

Gross margin = (Revenue − Cost of Goods Sold) / Revenue, where COGS includes API costs, hosting, and direct support. Contribution margin = (Revenue − Variable Costs) / Revenue, which typically includes only the costs that vary directly with each additional unit sold (API calls, payment processing). For AI products, contribution margin is often higher than gross margin because it excludes semi-fixed hosting costs. Investors focus on gross margin for benchmarking; operators track contribution margin to understand per-user profitability at the transaction level.

How do volume discounts affect LLM pricing?

All major LLM providers offer volume discounts at significant scale — typically negotiated at $50,000+/month in API spend. OpenAI, Anthropic, and Google all have enterprise agreements that can reduce per-token costs by 20–40% compared to published rates. At lower volumes, committed-use contracts (paying upfront for a token allotment) offer 10–20% discounts. For early-stage startups, the AI startup credit programs (AWS Activate, Google for Startups, Anthropic's startup program) can provide $5,000–$100,000 in free credits to delay the moment when API costs become material.