🤖 OpenAI API Token Cost Calculator
💡What Is OpenAI API Pricing?
A token is the basic unit of text that OpenAI models process. In English, one token is roughly 4 characters or 0.75 words. Every API call consumes input tokens (your prompt, system message, and conversation history) and produces output tokens (the model's response). Both are billed at the rates published on OpenAI's pricing page, measured per million tokens.
OpenAI also offers two cost-reduction features on supported models: Prompt Caching, which reuses the KV cache for repeated prompt prefixes at a 50% input discount, and the Batch API, which processes requests asynchronously within 24 hours at a 50% discount on both input and output tokens.
Source: OpenAI, "API Pricing," openai.com/api/pricing, accessed July 2025.
📊All OpenAI Models & Prices
Published prices from OpenAI as of July 2025. Prices are in USD per million tokens unless otherwise noted. Use search and filters to find specific models.
| Model ↕ | Family ↕ | Input $/M ↕ | Output $/M ↕ | Context | Cache Discount | Batch API |
|---|
Source: OpenAI, "API Pricing," openai.com/api/pricing. Accessed July 2025. Prices subject to change.
🔤Understanding Tokens
Tokens are the fundamental unit of text that OpenAI models process. They do not correspond exactly to words — the tokenizer splits text into subword fragments based on a vocabulary trained alongside the model. Understanding approximate token counts helps you predict costs and stay within context limits.
The most reliable way to count tokens is to use OpenAI's tiktoken library, which implements the exact tokenizer used by each model. For production cost monitoring, log the usage.prompt_tokens and usage.completion_tokens fields returned in every API response.
Source: OpenAI, "Counting tokens," platform.openai.com, accessed July 2025.
📐How Costs Are Calculated
The calculator uses the following formulas. All prices are in USD per million tokens as published by OpenAI.
── STANDARD COST ──────────────────────────────────────────────── Monthly Cost = (Input Tokens/Req × Requests/Month × Input Price / 1,000,000) + (Output Tokens/Req × Requests/Month × Output Price / 1,000,000) Cost per Request = Monthly Cost ÷ Requests per Month Annual Cost = Monthly Cost × 12 ── WITH PROMPT CACHING (50% off cached input tokens) ───────────── Cached Input Tokens = Input Tokens × Cache % Uncached Input Tokens = Input Tokens × (1 – Cache %) Monthly Cost (cached) = [ Uncached Tokens × Input Price + Cached Tokens × (Input Price × 0.50) ] × Requests / 1,000,000 + Output Tokens × Requests × Output Price / 1,000,000 Saving = Standard Monthly Cost – Cached Monthly Cost ── WITH BATCH API (50% off input + output) ─────────────────────── Batch Monthly Cost = Standard Monthly Cost × 0.50 Batch Saving = Standard Monthly Cost × 0.50Source: OpenAI API Pricing — openai.com/api/pricing, July 2025.
⚡Prompt Caching & Batch API
OpenAI offers two built-in cost reduction features on most GPT-4o and o-series models. Both can substantially lower your monthly bill with minimal code changes.
- Automatic on supported models
- Cache prefix ≥ 1,024 tokens
- Cache lasts ~5–10 min (extended by hits)
- Best for long, repeated system prompts
- No code change required
- Async processing (≤ 24 hours)
- Submit .jsonl file of requests
- Best for offline/non-realtime work
- Supported on GPT-4o, GPT-4, o1
- Full response in one result file
- Use Batch API + Prompt Caching together
- Batch applies 50% to all tokens
- Caching applies on top for repeated prefixes
- Ideal for bulk document processing
- Check model support before implementing
Toggle both features in the calculator above to see how they affect your specific usage. As a rule: if your system prompt is longer than 1,024 tokens and you make repeated calls, prompt caching is essentially free money. If your workload can tolerate asynchronous processing, the Batch API is the single highest-impact cost lever available.
Sources: OpenAI, "Prompt Caching," platform.openai.com; OpenAI, "Batch API," platform.openai.com. Accessed July 2025.
💰Cost Reduction Tips
-
🎯
Choose the smallest model that meets your quality bar GPT-4o mini is 15–30× cheaper than GPT-4o on input tokens and handles a surprisingly broad range of tasks well — classification, extraction, summarisation, Q&A over retrieved context, and code generation for common patterns. Always benchmark GPT-4o mini against your real test cases before defaulting to the full GPT-4o. The cost difference at scale is dramatic.
-
✂️
Compress your system prompt aggressively Every token in your system prompt is billed on every request. Audit yours line by line. Remove redundant instructions, rewrite verbose guidance into concise directives, and eliminate example-heavy sections Claude does not need. Reducing a system prompt from 1,200 to 400 tokens saves 800 input tokens per call. At 100,000 calls/month on GPT-4o ($2.50/M input), that saves $200 per month.
-
🔒
Always set
max_tokensWithout amax_tokenslimit, a model can generate unexpectedly long completions that inflate your output token costs. Audit a sample of real completions, find the 95th percentile response length, and setmax_tokensroughly 20% above that threshold. This prevents runaway responses without truncating legitimate answers. -
🗜️
Trim conversation history in multi-turn apps In chat applications, every previous message in the conversation is re-sent as input on each turn. Implement a sliding window that retains only the N most recent turns, or summarise older turns into a compact "conversation memory" block. For long-running sessions, this can reduce your input token count by 60–80% without meaningfully degrading response quality.
-
📦
Use the Batch API for offline workloads If any part of your workload does not require an instant response — bulk data enrichment, overnight analysis, pre-generating embeddings, report generation — move it to the Batch API. The 50% discount is one of the largest single cost levers OpenAI offers, and implementing it requires only reformatting your requests as a
.jsonlfile.
🔍Source & Calculator Accuracy
All prices are sourced from OpenAI's official API pricing page, last verified July 2025. OpenAI updates prices periodically and may introduce new models, retire older ones, or change discount structures without advance notice.
The calculator's arithmetic is exact given the prices in its data tables. The accuracy of your estimate depends on how closely your entered token counts and request volumes reflect your real usage. Use OpenAI's tiktoken library or log the usage object from live API calls to get precise numbers, then re-enter them here.
Non-USD currency amounts are indicative conversions using approximate exchange rates. OpenAI bills exclusively in US Dollars.
Primary source: OpenAI, "API Pricing," openai.com/api/pricing; OpenAI, "Platform Documentation," platform.openai.com/docs. Accessed July 2025.