Last updated: May 2026
Calculate the gross margin of your AI-powered product. See your cost per query, margin per user, recommended price points, and the exact volume needed to hit your target margin.
LLM Cost Inputs
Additional Costs & Revenue
Margin Results
Gross margin gauge
Cost vs Revenue per query
Pricing Sensitivity Table
How margin changes at different price points for your current usage
| Charge/Query | Monthly Revenue | Gross Margin | Net Profit/Loss | Verdict |
|---|
LLM Cost Per Query = (Input Tokens × Input Price/MTok + Output Tokens × Output Price/MTok) ÷ 1,000,000
Total Variable Cost Per Query = LLM cost + other per-query costs (infra, embeddings, DB reads, etc.)
Gross Margin = (Revenue per query − Variable cost per query) ÷ Revenue per query × 100. This is the margin before fixed costs — the pure unit economics of your AI product.
Net Monthly Profit = (Revenue per query − Variable cost per query) × monthly queries − fixed monthly costs.
Break-Even Volume = Fixed monthly costs ÷ Gross profit per query. The minimum query volume needed to cover your infrastructure and operations before turning profitable.
Target Price for Margin = Variable cost per query ÷ (1 − target margin %). For a 70% margin target: price = cost ÷ 0.30.
⚠️ LLM pricing changes frequently. Verify current rates with your provider before making pricing decisions. Gross margin shown is on query revenue only and excludes R&D, sales, and marketing costs.
Building AI products on top of LLM APIs means your COGS (Cost of Goods Sold) includes API costs. This calculator computes gross margin, break-even pricing, and pricing sensitivity for AI-powered products.
Worked example: SaaS product charging $49/month with users averaging 500 queries/month. Each query: 800 input tokens + 400 output tokens using Claude Sonnet 4 ($3/$15 per M). API cost per query = (800 x $0.000003) + (400 x $0.000015) = $0.0024 + $0.006 = $0.0084. Cost per user/month = $0.0084 x 500 = $4.20. Revenue per user: $49. Gross margin: (49 - 4.20) / 49 = 91.4%.
Traditional SaaS targets 70-85% gross margin. AI SaaS with LLM inference costs typically achieves 60-80% depending on query volume, model choice, and pricing. The key risk is usage-based cost variability — a power user consuming 10x average tokens can erode margin significantly. Best practices: (1) Set usage limits or tiers. (2) Use cheaper models for simpler tasks. (3) Build prompt caching for repeated system prompts. (4) Price per seat + usage overages rather than flat per-seat to protect margins at high usage.
The difference between frontier and economy models is dramatic. Claude Haiku costs $0.80/$4 per M tokens; Claude Sonnet 4 costs $3/$15 — approximately 4x more expensive. For a product with $5 monthly API cost per user using Sonnet, switching to Haiku for routine queries reduces that to ~$1.25/user, adding $3.75/month in gross profit per user. At 1,000 users, that is $3,750/month in additional margin — a compelling reason to implement model tiering based on task complexity.
Prompt caching (available on Claude and GPT-4o) stores frequently repeated content so it is not re-billed at full price. If your application has a 2,000-token system prompt sent with every request, enabling caching reduces those tokens from full price to 10% of full price on cache hits. For Claude Sonnet 4: uncached system prompt = $0.006 per request. Cached: $0.0006. At 10,000 daily requests, that saves $54/day ($1,620/month) on system prompt costs alone.
Three models dominate: (1) Per-seat subscription — predictable revenue, usage variability risk on costs; works well when usage is consistent. (2) Usage-based — charges scale with API costs, no margin compression; but customers dislike unpredictable bills. (3) Hybrid — base seat fee + usage overages above a generous included limit. The hybrid model is increasingly common: a flat fee covers 80% of users; heavy users pay overages that protect your margin. Analyze your user usage distribution before committing to a pricing model.
Gross margin in AI products is primarily driven by how central LLM inference is to the core value prop and how intensively users engage with it. A product where AI is a minor enhancement (like AI-generated email subject line suggestions) will have very different margin dynamics than an AI-first product where every user interaction runs through a frontier model.
The benchmarks below are drawn from published SaaS industry data and investor reports. Actual margins depend heavily on model selection, usage limits, caching strategy, and volume discounts negotiated with providers.
| Product Type | Typical API Cost % of Revenue | Gross Margin Target | Notes |
|---|---|---|---|
| B2B SaaS (AI feature add-on) | 5–15% | 70–80% | AI is one of many features |
| Consumer AI app | 15–35% | 50–70% | High usage intensity |
| AI-first workflow tool | 20–40% | 55–75% | Core value from AI |
| API reseller/wrapper | 40–60% | 25–45% | Thin margins without differentiation |
| Enterprise AI platform | 8–20% | 65–80% | Volume discounts help significantly |
| Freemium AI tool | 30–50% on paid tier | 60–75% on paid tier | Free users subsidized by paid |
What is a good gross margin for an AI product?
Traditional software SaaS targets 70–85% gross margin. AI-native products with meaningful LLM inference costs typically land at 55–80% gross margin, depending on model costs and usage intensity. Anything above 60% is considered healthy for an AI-first product. Below 50% warrants a review of model selection, caching strategy, and pricing — or signals a product where margins won't support long-term unit economics. Investor benchmarks for Series A AI companies typically expect 65%+ gross margin.
How do LLM costs scale with users?
LLM costs scale roughly linearly with active users and their usage intensity — unlike traditional software infrastructure, where costs scale sublinearly due to resource pooling. This means AI products can face margin compression as they scale if usage per user grows faster than revenue. The mitigation strategies are: usage-based pricing or limits, model tiering (use cheaper models for simple tasks), prompt caching for repeated system prompts, and response caching for common queries. Plan for this non-linearity in your financial model before raising prices becomes your only option.
How do I control API costs in production?
Five high-impact levers: (1) Model tiering — route simple tasks to cheaper models automatically. (2) Prompt caching — 90% cost reduction on repeated system prompt tokens. (3) Response caching — cache common query responses for 24–48 hours. (4) Usage limits — cap tokens per user per day/month and upsell overages. (5) Output length limits — set max_tokens to prevent runaway responses. Monitoring per-user API spend with real-time alerts is essential — one misconfigured prompt with a high-traffic endpoint can generate thousands in unexpected costs overnight.
What is the difference between gross margin and contribution margin?
Gross margin = (Revenue − Cost of Goods Sold) / Revenue, where COGS includes API costs, hosting, and direct support. Contribution margin = (Revenue − Variable Costs) / Revenue, which typically includes only the costs that vary directly with each additional unit sold (API calls, payment processing). For AI products, contribution margin is often higher than gross margin because it excludes semi-fixed hosting costs. Investors focus on gross margin for benchmarking; operators track contribution margin to understand per-user profitability at the transaction level.
How do volume discounts affect LLM pricing?
All major LLM providers offer volume discounts at significant scale — typically negotiated at $50,000+/month in API spend. OpenAI, Anthropic, and Google all have enterprise agreements that can reduce per-token costs by 20–40% compared to published rates. At lower volumes, committed-use contracts (paying upfront for a token allotment) offer 10–20% discounts. For early-stage startups, the AI startup credit programs (AWS Activate, Google for Startups, Anthropic's startup program) can provide $5,000–$100,000 in free credits to delay the moment when API costs become material.