OpenAI GPT-4 token cost calculator

OpenAI GPT-4 Token Cost Calculator

OpenAI API Token Cost Calculator

OpenAI charges per million tokens — billed separately for input (prompt) and output (completion). Select a model, enter your usage, and get an instant cost breakdown including Prompt Caching and Batch API discounts. Official pricing →
Select Model

Your prompt + system message. ~4 chars ≈ 1 token.
Tokens in the model's completion.
Total API calls in a typical month.
OpenAI bills in USD. Others are indicative.
Optional Features
Prompt Caching Cached input tokens at 50% off
Batch API 50% discount, async 24-hour window
Cost Estimate
Estimated Monthly Cost
Cost / Request
Input Token Cost
Output Token Cost
Total Input Tokens / Mo
Total Output Tokens / Mo
Annual Cost
Cost Breakdown
Input tokens
Output tokens
💾 Prompt Caching Savings
Without Caching
With Caching
Monthly Saving
⚡ Batch API Savings (50% off)
Standard Cost
Batch Cost
Monthly Saving
Note: Estimates use OpenAI's published list prices (July 2025). Actual bills may vary due to tiered discounts, fine-tuned model pricing, or future rate changes. Always verify at openai.com/api/pricing.

What Is OpenAI API Pricing?

OpenAI charges for API access on a pay-per-token basis — you pay separately for the tokens you send (input) and the tokens the model generates (output). There is no monthly subscription for API access. Costs scale directly with usage and vary significantly by model, from fractions of a cent per million tokens for GPT-3.5 Turbo to several dollars per million for GPT-4o and the o-series reasoning models.

A token is the basic unit of text that OpenAI models process. In English, one token is roughly 4 characters or 0.75 words. Every API call consumes input tokens (your prompt, system message, and conversation history) and produces output tokens (the model's response). Both are billed at the rates published on OpenAI's pricing page, measured per million tokens.

OpenAI also offers two cost-reduction features on supported models: Prompt Caching, which reuses the KV cache for repeated prompt prefixes at a 50% input discount, and the Batch API, which processes requests asynchronously within 24 hours at a 50% discount on both input and output tokens.

Source: OpenAI, "API Pricing," openai.com/api/pricing, accessed July 2025.

All OpenAI Models & Prices

Published prices from OpenAI as of July 2025. Prices are in USD per million tokens unless otherwise noted. Use search and filters to find specific models.

Model Family Input $/M Output $/M Context Cache Discount Batch API

Source: OpenAI, "API Pricing," openai.com/api/pricing. Accessed July 2025. Prices subject to change.

Understanding Tokens

Tokens are the fundamental unit of text that OpenAI models process. They do not correspond exactly to words — the tokenizer splits text into subword fragments based on a vocabulary trained alongside the model. Understanding approximate token counts helps you predict costs and stay within context limits.

≈ 4English characters per token on average
≈ 0.75English words per token
1,333Tokens in a typical 1,000-word document
↑ MoreTokens per word for code, math & non-Latin scripts
usage{}API field reporting exact prompt_tokens and completion_tokens per call
tiktokenOpenAI's open-source library for counting tokens before sending a request

The most reliable way to count tokens is to use OpenAI's tiktoken library, which implements the exact tokenizer used by each model. For production cost monitoring, log the usage.prompt_tokens and usage.completion_tokens fields returned in every API response.

Source: OpenAI, "Counting tokens," platform.openai.com, accessed July 2025.

How Costs Are Calculated

The calculator uses the following formulas. All prices are in USD per million tokens as published by OpenAI.

── STANDARD COST ────────────────────────────────────────────────
Monthly Cost =
  (Input Tokens/Req × Requests/Month × Input Price  / 1,000,000)
+ (Output Tokens/Req × Requests/Month × Output Price / 1,000,000)

Cost per Request = Monthly Cost ÷ Requests per Month
Annual Cost      = Monthly Cost × 12

── WITH PROMPT CACHING (50% off cached input tokens) ─────────────
Cached Input Tokens   = Input Tokens × Cache %
Uncached Input Tokens = Input Tokens × (1 – Cache %)

Monthly Cost (cached) =
  [ Uncached Tokens × Input Price
  + Cached Tokens   × (Input Price × 0.50) ] × Requests / 1,000,000
  + Output Tokens × Requests × Output Price / 1,000,000

Saving = Standard Monthly Cost – Cached Monthly Cost

── WITH BATCH API (50% off input + output) ───────────────────────
Batch Monthly Cost = Standard Monthly Cost × 0.50
Batch Saving       = Standard Monthly Cost × 0.50
Source: OpenAI API Pricing — openai.com/api/pricing, July 2025.

Prompt Caching & Batch API

OpenAI offers two built-in cost reduction features on most GPT-4o and o-series models. Both can substantially lower your monthly bill with minimal code changes.

💾 Prompt Caching
50% off cached input tokens
  • Automatic on supported models
  • Cache prefix ≥ 1,024 tokens
  • Cache lasts ~5–10 min (extended by hits)
  • Best for long, repeated system prompts
  • No code change required
⚡ Batch API
50% off input + output
  • Async processing (≤ 24 hours)
  • Submit .jsonl file of requests
  • Best for offline/non-realtime work
  • Supported on GPT-4o, GPT-4, o1
  • Full response in one result file
🔀 Combined Strategy
Up to ~75% off
  • Use Batch API + Prompt Caching together
  • Batch applies 50% to all tokens
  • Caching applies on top for repeated prefixes
  • Ideal for bulk document processing
  • Check model support before implementing

Toggle both features in the calculator above to see how they affect your specific usage. As a rule: if your system prompt is longer than 1,024 tokens and you make repeated calls, prompt caching is essentially free money. If your workload can tolerate asynchronous processing, the Batch API is the single highest-impact cost lever available.

Sources: OpenAI, "Prompt Caching," platform.openai.com; OpenAI, "Batch API," platform.openai.com. Accessed July 2025.

Cost Reduction Tips

  • Choose the smallest model that meets your quality bar GPT-4o mini is 15–30× cheaper than GPT-4o on input tokens and handles a surprisingly broad range of tasks well — classification, extraction, summarisation, Q&A over retrieved context, and code generation for common patterns. Always benchmark GPT-4o mini against your real test cases before defaulting to the full GPT-4o. The cost difference at scale is dramatic.
  • Compress your system prompt aggressively Every token in your system prompt is billed on every request. Audit yours line by line. Remove redundant instructions, rewrite verbose guidance into concise directives, and eliminate example-heavy sections Claude does not need. Reducing a system prompt from 1,200 to 400 tokens saves 800 input tokens per call. At 100,000 calls/month on GPT-4o ($2.50/M input), that saves $200 per month.
  • Always set max_tokens Without a max_tokens limit, a model can generate unexpectedly long completions that inflate your output token costs. Audit a sample of real completions, find the 95th percentile response length, and set max_tokens roughly 20% above that threshold. This prevents runaway responses without truncating legitimate answers.
  • Trim conversation history in multi-turn apps In chat applications, every previous message in the conversation is re-sent as input on each turn. Implement a sliding window that retains only the N most recent turns, or summarise older turns into a compact "conversation memory" block. For long-running sessions, this can reduce your input token count by 60–80% without meaningfully degrading response quality.
  • Use the Batch API for offline workloads If any part of your workload does not require an instant response — bulk data enrichment, overnight analysis, pre-generating embeddings, report generation — move it to the Batch API. The 50% discount is one of the largest single cost levers OpenAI offers, and implementing it requires only reformatting your requests as a .jsonl file.

Source & Calculator Accuracy

All prices are sourced from OpenAI's official API pricing page, last verified July 2025. OpenAI updates prices periodically and may introduce new models, retire older ones, or change discount structures without advance notice.

The calculator's arithmetic is exact given the prices in its data tables. The accuracy of your estimate depends on how closely your entered token counts and request volumes reflect your real usage. Use OpenAI's tiktoken library or log the usage object from live API calls to get precise numbers, then re-enter them here.

Non-USD currency amounts are indicative conversions using approximate exchange rates. OpenAI bills exclusively in US Dollars.

Primary source: OpenAI, "API Pricing," openai.com/api/pricing; OpenAI, "Platform Documentation," platform.openai.com/docs. Accessed July 2025.