Token Counter — Estimate Tokens & Context Window for Claude, GPT & Gemini

Input

Paste your text or prompt

0 characters

Select model

Results

✦

Start typing or paste text above to see token estimates

All Models Comparison

Paste text above to compare across all models

What is a token? Language models don't read text character by character — they break it into "tokens," which are chunks of roughly 3–4 characters for common English words. For example, "calculator" might be 2–3 tokens, while a single letter like "a" is 1 token.

Estimation method: This calculator uses the widely-cited rule of thumb of approximately 4 characters per token for typical English text. This gives a reasonable estimate for planning purposes. Non-English text, code, and special characters may tokenize differently (code is often more tokens per character; some languages like Chinese may be fewer characters per token).

Why this matters: API pricing is based on token count. Context windows have hard limits — if your input exceeds the limit, the API will reject the request or truncate the text. Knowing your usage upfront helps you design prompts that fit within budget and limits.

For exact counts: Use the official tokenizer tools — OpenAI's Tokenizer playground, Anthropic's token counting API endpoint, or Google's Vertex AI token counters — before production use.

Frequently Asked Questions

How many tokens is a typical ChatGPT conversation?

A typical back-and-forth ChatGPT message exchange uses 200-500 tokens per turn (user message + response). A detailed technical conversation with long responses might use 1,000-3,000 tokens per turn. A full 1-hour work session might consume 10,000-50,000 tokens total. At GPT-4o pricing ($2.50/M input, $10/M output), even intensive daily use costs under $1/day for most users — which is why the $20/month ChatGPT Plus subscription is cost-effective for anyone making more than a few hundred thousand tokens of requests monthly.

Do different languages use different token counts?

Yes — significantly. English is the most token-efficient language for current models (which were trained predominantly on English text). Chinese, Japanese, and Korean typically use 2-3x more tokens per word. Arabic and other right-to-left scripts are also less efficient. This means processing a Chinese document uses 2-3x more context and costs 2-3x more per character than English. Developers building multilingual applications should factor this into cost and context window planning.

What happens when I exceed the context window?

Each model handles context overflow differently. Most APIs return an error (context_length_exceeded) if your input exceeds the limit. ChatGPT and Claude in their consumer apps silently truncate or summarize older context to fit. For API applications, you must handle this explicitly — common strategies include: chunking long documents, using summarization to compress earlier conversation history, or using a model with a larger context window (Claude's 200k or Gemini's 1M token context).

What is the difference between context window and training data cutoff?

Context window is how much text a model can process in one request right now. Training cutoff is the date after which the model has no knowledge of world events. A model with a 200,000 token context window but a training cutoff of April 2024 can process a very long document but will not know about events after April 2024 unless that information is included in the context. Retrieval-Augmented Generation (RAG) solves the knowledge cutoff problem by injecting current information into the context window.

Token Count Reference by Content Type

Estimating token counts before sending requests helps you predict costs, avoid context window overflows, and design efficient prompts. The table below provides reference benchmarks for common content types based on English text. Non-English languages typically use more tokens per word — Chinese and Japanese often 2–3× more than English.

The ~1.3 tokens per word rule of thumb works for most English prose. Technical content with lots of numbers, code, or special characters may tokenize differently. Use an official tokenizer tool (OpenAI's tiktoken, Anthropic's token counter) to get exact counts for your specific content before finalizing cost estimates.

Content Type	Approx Words	Approx Tokens	Notes
Tweet	20 words	~27 tokens	Short text ~1.3 tokens/word
Email	150 words	~200 tokens	Common business use case
Blog post	800 words	~1,067 tokens	Standard article length
Short story	2,000 words	~2,667 tokens	Fits easily in all context windows
Novel chapter	5,000 words	~6,667 tokens	Fits in all modern context windows
Full book	80,000 words	~107,000 tokens	Exceeds most models' context windows
Code file (200 lines)	~200 lines	~500–2,000 tokens	Varies by language and density

Worked Examples

Example 1 — Document Q&A cost estimation
You're building a document Q&A app. A user uploads a 10-page PDF (~3,000 words = ~4,000 tokens). Each question adds ~50 tokens and gets a ~200-token answer. Using Claude 3 Haiku at $0.25/$1.25 per 1M tokens: cost per query = (4,050 input × $0.25 + 200 output × $1.25) / 1,000,000 = $0.0010125 + $0.00025 = $0.00126/query. At 10,000 monthly queries: $12.60/month. At 100,000 queries: $126/month — still very manageable unit economics.

Example 2 — Chat conversation cost
A 20-message chat conversation where each message is ~100 tokens. Because context grows with each turn, the model sees an accumulating history: message 1 sees 100 tokens, message 2 sees 200, etc. Total input tokens across all 20 turns ≈ 100 + 200 + ... + 2,000 = 21,000 tokens. Total output tokens ≈ 20 × 100 = 2,000. On GPT-4o at $2.50/$10 per 1M: cost = ($21,000 × $2.50 + $2,000 × $10) / 1,000,000 = $0.0525 + $0.02 = $0.0725 per full conversation. At 1,000 conversations/month: $72.50.

Frequently Asked Questions

What is a token in AI models?

A token is the basic unit of text that an LLM reads and generates. It is roughly equivalent to a word fragment — common short words like "the" or "is" are one token each, while longer words like "tokenization" may be split into two or three tokens. Punctuation, spaces, and special characters also consume tokens. Modern LLMs use Byte Pair Encoding (BPE) or similar algorithms to determine token boundaries, which is why the exact count can seem counterintuitive. Every token in your input and output contributes to your API bill, which is why token efficiency matters at scale.

How many tokens is a word?

In English, one word is approximately 1.3 tokens on average. Short common words (1 token each): "the," "is," "a," "of." Medium words (1–2 tokens): "running," "calculator," "because." Long or uncommon words (2–4 tokens): "extraordinary," "tokenization," "cryptocurrency." Numbers and dates vary: "2024" is often 1 token; "2024-05-19" might be 4–6. Code and URLs are typically less efficient — a URL like "https://example.com/path" might be 8–12 tokens. For budgeting, use 1.3 tokens/word for prose and 2–3× that for code or URLs.

Why do input and output tokens cost different amounts?

Input (prompt) tokens are processed in parallel — the model reads all input tokens simultaneously, which is computationally efficient. Output (completion) tokens are generated sequentially — the model must produce one token at a time, each depending on the previous, which is much more computationally intensive and cannot be parallelized. This sequential generation is why output tokens typically cost 4–6× more than input tokens across most major models. Practically, this means minimizing response length (via max_tokens and concise prompting) is one of the most effective ways to reduce API costs.

How do I estimate token count before sending a request?

The most accurate method is to use the provider's official tokenizer: OpenAI's tiktoken library (open source, available in Python/JavaScript), Anthropic's token counting API endpoint, or Google's tokenize method in the Gemini SDK. For quick estimates without code: paste your text into OpenAI's tiktoken playground at platform.openai.com/tokenizer. For budgeting before building: use the 1.3 tokens/word rule for English prose, add ~500 tokens buffer for system prompt overhead, and multiply by your expected monthly request volume. Add 20% contingency for real-world variation.

What happens when you exceed the context window?

When your total input (system prompt + conversation history + current message) exceeds the model's context limit, most APIs return a context_length_exceeded error and process nothing. Consumer apps (ChatGPT, Claude.ai) handle this automatically by silently summarizing or truncating older messages. In production API applications, you must handle this explicitly with strategies like: sliding window (drop oldest messages), conversation summarization (compress history into a summary), document chunking (split large inputs), or upgrading to a model with a larger context window. Always build context management into any application with multi-turn conversations or long document inputs.

Token Utilization Calculator

How the Token Calculator Works

Frequently Asked Questions

Token Count Reference by Content Type

Worked Examples

Frequently Asked Questions