Estimate the full monthly cost of your Retrieval-Augmented Generation pipeline — embedding, vector storage, retrieval, reranking, and LLM inference — broken down per query and per month.
Knowledge Base & Embeddings
One-time and ongoing ingestion costs
Vector Database
Query Volume & Retrieval
RAG Pipeline Cost Results
Cost breakdown by pipeline stage
| Stage | Monthly Cost | % of Total | Cost/Query |
|---|
Embedding cost = Total chunks × tokens/chunk × embedding price/token. Monthly re-ingestion adds a fraction of this based on your update frequency.
Vector DB cost = Based on number of vectors stored (Serverless) or compute hours (Pod/dedicated). Self-hosted eliminates API cost but adds server infrastructure cost.
Retrieval query cost = Embedding the user query (query tokens × embedding price). For rerankers, add the reranker API cost per search.
LLM inference cost = (input tokens × input price + output tokens × output price) × queries/month. This is typically the largest cost driver at scale.
Cost per query = Total monthly cost ÷ number of queries. Useful for pricing decisions if building a product on top of this RAG system.
⚠️ Pricing is based on publicly available API rates as of April 2026 and may change. Self-hosting costs (GPU, bandwidth, ops) are not fully modeled. Use for estimation only.