🧮 Claude API Pricing Calculator
💡 What Is Claude API Pricing?
Anthropic charges for API access based on token consumption. Because input and output tokens carry different rates — and output tokens typically cost more per million — understanding your prompt-to-response ratio is the most important variable when projecting costs. The model family you choose (Haiku, Sonnet, or Opus) and the generation (3, 3.5, 4, 4.5, 4.6) both significantly affect your bill.
Newer model generations like Claude Sonnet 4.6 and Opus 4.6 offer larger 1M-token context windows alongside competitive pricing, making them suitable for long-document analysis, agentic workflows, and complex multi-step tasks that older generations could not handle cost-effectively.
Source: Anthropic, "Claude API Pricing," anthropic.com/pricing, accessed July 2025.
📊 All Claude Models & Prices
The table below lists every Claude model currently documented by Anthropic. Models marked "N/A" for price are deprecated or legacy versions no longer available via the standard API; they are included for reference. Use the search and filter controls to find specific models. Click any column header to sort the table.
| Model ↕ | Tier ↕ | Input $/M ↕ | Output $/M ↕ | Context ↕ | Released ↕ | Weekly Usage ↕ |
|---|
Sources: Anthropic, "Claude API Pricing," anthropic.com/pricing; Anthropic, "Models Overview," docs.anthropic.com. Weekly token usage figures sourced from OpenRouter usage data. Prices and availability subject to change.
🔤 Understanding Tokens
A token is the smallest unit of text that Claude processes. Tokens do not map one-to-one with words — they are vocabulary fragments defined by the model's tokenizer. Understanding approximate token counts helps you predict costs accurately before running any API calls.
The most reliable approach is to instrument your code to log the
usage object returned in every API response. It contains
input_tokens and output_tokens for each call. Average
20–30 representative requests, then use those real numbers in the calculator above.
Source: Anthropic, "Token counting," docs.anthropic.com, accessed July 2025.
📐 How to Estimate Your Claude API Costs
The formula the calculator above uses is straightforward. You can verify any result manually:
Monthly Cost = (Input Tokens/Req × Requests/Month × Input Price ÷ 1,000,000) + (Output Tokens/Req × Requests/Month × Output Price ÷ 1,000,000) Cost per Request = Monthly Cost ÷ Requests per Month Annual Cost = Monthly Cost × 12Source: Anthropic per-million-token pricing model — anthropic.com/pricing
A practical approach: run 20–30 representative API calls in development and log the
usage field from each response. Average the input and output token counts,
then multiply by your expected monthly request volume. If you are in the planning phase
with no usage data, use these benchmarks: a typical chatbot turn uses 500–1,500 input
tokens and 200–500 output tokens; a document summarisation task might use 3,000–8,000
input tokens and 300–800 output tokens.
💾 Prompt Caching Explained
To use prompt caching, add a cache_control parameter to the content blocks
you want cached. The first request that creates the cache is billed at approximately 125%
of the standard input rate (the write cost); all subsequent requests hitting that cache
are billed at approximately 10% (the read cost). Anthropic stores the cache for up to
5 minutes, extended by additional cache hits.
Prompt caching is most valuable when: your system prompt exceeds 1,024 tokens; you pass the same large document or knowledge base with every request; or you run multi-turn conversations where earlier turns are prepended to each new call. Use the caching toggle in the calculator above to model the potential saving for your specific usage pattern.
Source: Anthropic, "Prompt Caching," docs.anthropic.com, accessed July 2025.
💰 Cost Reduction Tips
-
🎯
Route tasks to the right model Build a routing layer that sends simple tasks to Claude Haiku 4.5 ($1/$5 per M tokens) and reserves Sonnet or Opus variants for tasks requiring higher capability. Haiku handles most classification, extraction, and short-form generation tasks at a fraction of the cost of flagship models.
-
✂️
Compress your system prompt Every token in your system prompt is billed on every request. Audit yours regularly: remove redundant instructions, rewrite verbose directives concisely, and delete examples Claude does not need. A reduction from 800 to 250 tokens saves 550 input tokens per call. At 50,000 calls/month on Sonnet 4 ($3/M), that saves $82.50 per month.
-
🔒
Set explicit max_tokens limits Pass
max_tokensin every request. Audit a sample of real responses, find the 95th percentile output length for your use case, and set max_tokens roughly 20% above that value. This prevents occasional runaway responses from inflating your output token costs without cutting off legitimate answers. -
📦
Use the Batch API for non-real-time work Anthropic's Message Batches API processes requests asynchronously at a 50% discount on both input and output token prices. If your workload includes tasks that do not require an immediate response — bulk analysis, overnight processing, report generation — the Batch API is the single largest cost lever available.
-
🧹
Trim conversation history in multi-turn apps In chatbot or agent applications, every previous message is included as input on subsequent calls. Implement a sliding window keeping only the N most recent turns, or summarise older turns into a compact block. Log token counts per session to identify conversations with unusually high input costs.
🔍 Price Source & Calculator Accuracy
All model prices are sourced from Anthropic's official pricing page and model documentation, last verified July 2025. Anthropic may update prices, introduce new models, or deprecate older ones at any time without prior notice. Models listed as N/A for pricing are either deprecated or available only through enterprise arrangements.
The calculator's arithmetic is exact: it applies published per-million-token rates to the
values you enter. Accuracy of your estimate depends on how closely your entered token counts
and request volumes match real usage. The best way to improve accuracy is to instrument
production code to log the usage field from each API response, then use real
averages as inputs.
Non-USD currency amounts are indicative conversions using approximate exchange rates. Anthropic invoices exclusively in US Dollars.
Primary source: Anthropic, "Claude API Pricing," anthropic.com/pricing; Anthropic, "Models Overview," docs.anthropic.com/en/docs/about-claude/models. Weekly usage data: OpenRouter model stats. Accessed July 2025.