RAG Pipeline Cost Calculator

Estimate the full monthly cost of your Retrieval-Augmented Generation pipeline — embedding, vector storage, retrieval, reranking, and LLM inference — broken down per query and per month.

Knowledge Base & Embeddings

One-time and ongoing ingestion costs

Vector Database

$

Query Volume & Retrieval

RAG Pipeline Cost Results

Total Monthly Cost
all components
Cost Per Query
fully loaded
LLM Share
% of total cost
Annual Projection

Cost breakdown by pipeline stage

StageMonthly Cost% of TotalCost/Query

Embedding cost = Total chunks × tokens/chunk × embedding price/token. Monthly re-ingestion adds a fraction of this based on your update frequency.

Vector DB cost = Based on number of vectors stored (Serverless) or compute hours (Pod/dedicated). Self-hosted eliminates API cost but adds server infrastructure cost.

Retrieval query cost = Embedding the user query (query tokens × embedding price). For rerankers, add the reranker API cost per search.

LLM inference cost = (input tokens × input price + output tokens × output price) × queries/month. This is typically the largest cost driver at scale.

Cost per query = Total monthly cost ÷ number of queries. Useful for pricing decisions if building a product on top of this RAG system.

⚠️ Pricing is based on publicly available API rates as of April 2026 and may change. Self-hosting costs (GPU, bandwidth, ops) are not fully modeled. Use for estimation only.