OpenAI GPT-4 Token Cost Calculator

OpenAI API Token Cost Calculator

OpenAI charges per million tokens — billed separately for input (prompt) and output (completion). Select a model, enter your usage, and get an instant cost breakdown including Prompt Caching and Batch API discounts. Official pricing →

Select Model

Please select a model to continue.

—

Input Tokens per Request Your prompt + system message. ~4 chars ≈ 1 token.

Enter 1 to 2,000,000.

Output Tokens per Request Tokens in the model's completion.

Enter 1 to 200,000.

Requests per Month Total API calls in a typical month.

Enter 1 to 100,000,000.

Display Currency

OpenAI bills in USD. Others are indicative.

Optional Features

Prompt Caching Cached input tokens at 50% off

Batch API 50% discount, async 24-hour window

Cost Estimate

Estimated Monthly Cost

—

Cost / Request

—

Input Token Cost

—

Output Token Cost

—

Total Input Tokens / Mo

—

Total Output Tokens / Mo

—

Annual Cost

—

Cost Breakdown

Input tokens—

Output tokens—

Cache savings—

💾 Prompt Caching Savings

Without Caching

—

With Caching

—

Monthly Saving

—

⚡ Batch API Savings (50% off)

Standard Cost

—

Batch Cost

—

Monthly Saving

—

Note: Estimates use OpenAI's published list prices (July 2025). Actual bills may vary due to tiered discounts, fine-tuned model pricing, or future rate changes. Always verify at openai.com/api/pricing.

What Is OpenAI API Pricing?

OpenAI charges for API access on a pay-per-token basis — you pay separately for the tokens you send (input) and the tokens the model generates (output). There is no monthly subscription for API access. Costs scale directly with usage and vary significantly by model, from fractions of a cent per million tokens for GPT-3.5 Turbo to several dollars per million for GPT-4o and the o-series reasoning models.

A token is the basic unit of text that OpenAI models process. In English, one token is roughly 4 characters or 0.75 words. Every API call consumes input tokens (your prompt, system message, and conversation history) and produces output tokens (the model's response). Both are billed at the rates published on OpenAI's pricing page, measured per million tokens.

OpenAI also offers two cost-reduction features on supported models: Prompt Caching, which reuses the KV cache for repeated prompt prefixes at a 50% input discount, and the Batch API, which processes requests asynchronously within 24 hours at a 50% discount on both input and output tokens.

Source: OpenAI, "API Pricing," openai.com/api/pricing, accessed July 2025.

All OpenAI Models & Prices

Published prices from OpenAI as of July 2025. Prices are in USD per million tokens unless otherwise noted. Use search and filters to find specific models.

Model ↕	Family ↕	Input $/M ↕	Output $/M ↕	Context	Cache Discount	Batch API

Source: OpenAI, "API Pricing," openai.com/api/pricing. Accessed July 2025. Prices subject to change.

Understanding Tokens

Tokens are the fundamental unit of text that OpenAI models process. They do not correspond exactly to words — the tokenizer splits text into subword fragments based on a vocabulary trained alongside the model. Understanding approximate token counts helps you predict costs and stay within context limits.

≈ 4English characters per token on average

≈ 0.75English words per token

1,333Tokens in a typical 1,000-word document

↑ MoreTokens per word for code, math & non-Latin scripts

usage{}API field reporting exact prompt_tokens and completion_tokens per call

tiktokenOpenAI's open-source library for counting tokens before sending a request

The most reliable way to count tokens is to use OpenAI's tiktoken library, which implements the exact tokenizer used by each model. For production cost monitoring, log the usage.prompt_tokens and usage.completion_tokens fields returned in every API response.

Source: OpenAI, "Counting tokens," platform.openai.com, accessed July 2025.

How Costs Are Calculated

The calculator uses the following formulas. All prices are in USD per million tokens as published by OpenAI.

── STANDARD COST ────────────────────────────────────────────────
Monthly Cost =
  (Input Tokens/Req × Requests/Month × Input Price  / 1,000,000)
+ (Output Tokens/Req × Requests/Month × Output Price / 1,000,000)

Cost per Request = Monthly Cost ÷ Requests per Month
Annual Cost      = Monthly Cost × 12

── WITH PROMPT CACHING (50% off cached input tokens) ─────────────
Cached Input Tokens   = Input Tokens × Cache %
Uncached Input Tokens = Input Tokens × (1 – Cache %)

Monthly Cost (cached) =
  [ Uncached Tokens × Input Price
  + Cached Tokens   × (Input Price × 0.50) ] × Requests / 1,000,000
  + Output Tokens × Requests × Output Price / 1,000,000

Saving = Standard Monthly Cost – Cached Monthly Cost

── WITH BATCH API (50% off input + output) ───────────────────────
Batch Monthly Cost = Standard Monthly Cost × 0.50
Batch Saving       = Standard Monthly Cost × 0.50

Source: OpenAI API Pricing — openai.com/api/pricing, July 2025.

Prompt Caching & Batch API

OpenAI offers two built-in cost reduction features on most GPT-4o and o-series models. Both can substantially lower your monthly bill with minimal code changes.

💾 Prompt Caching

50% off cached input tokens

Automatic on supported models
Cache prefix ≥ 1,024 tokens
Cache lasts ~5–10 min (extended by hits)
Best for long, repeated system prompts
No code change required

⚡ Batch API

50% off input + output

Async processing (≤ 24 hours)
Submit .jsonl file of requests
Best for offline/non-realtime work
Supported on GPT-4o, GPT-4, o1
Full response in one result file

🔀 Combined Strategy

Up to ~75% off

Use Batch API + Prompt Caching together
Batch applies 50% to all tokens
Caching applies on top for repeated prefixes
Ideal for bulk document processing
Check model support before implementing

Toggle both features in the calculator above to see how they affect your specific usage. As a rule: if your system prompt is longer than 1,024 tokens and you make repeated calls, prompt caching is essentially free money. If your workload can tolerate asynchronous processing, the Batch API is the single highest-impact cost lever available.

Sources: OpenAI, "Prompt Caching," platform.openai.com; OpenAI, "Batch API," platform.openai.com. Accessed July 2025.

Cost Reduction Tips

Choose the smallest model that meets your quality bar GPT-4o mini is 15–30× cheaper than GPT-4o on input tokens and handles a surprisingly broad range of tasks well — classification, extraction, summarisation, Q&A over retrieved context, and code generation for common patterns. Always benchmark GPT-4o mini against your real test cases before defaulting to the full GPT-4o. The cost difference at scale is dramatic.
Compress your system prompt aggressively Every token in your system prompt is billed on every request. Audit yours line by line. Remove redundant instructions, rewrite verbose guidance into concise directives, and eliminate example-heavy sections Claude does not need. Reducing a system prompt from 1,200 to 400 tokens saves 800 input tokens per call. At 100,000 calls/month on GPT-4o ($2.50/M input), that saves $200 per month.
Always set max_tokens Without a max_tokens limit, a model can generate unexpectedly long completions that inflate your output token costs. Audit a sample of real completions, find the 95th percentile response length, and set max_tokens roughly 20% above that threshold. This prevents runaway responses without truncating legitimate answers.
Trim conversation history in multi-turn apps In chat applications, every previous message in the conversation is re-sent as input on each turn. Implement a sliding window that retains only the N most recent turns, or summarise older turns into a compact "conversation memory" block. For long-running sessions, this can reduce your input token count by 60–80% without meaningfully degrading response quality.
Use the Batch API for offline workloads If any part of your workload does not require an instant response — bulk data enrichment, overnight analysis, pre-generating embeddings, report generation — move it to the Batch API. The 50% discount is one of the largest single cost levers OpenAI offers, and implementing it requires only reformatting your requests as a .jsonl file.

Source & Calculator Accuracy

All prices are sourced from OpenAI's official API pricing page, last verified July 2025. OpenAI updates prices periodically and may introduce new models, retire older ones, or change discount structures without advance notice.

The calculator's arithmetic is exact given the prices in its data tables. The accuracy of your estimate depends on how closely your entered token counts and request volumes reflect your real usage. Use OpenAI's tiktoken library or log the usage object from live API calls to get precise numbers, then re-enter them here.

Non-USD currency amounts are indicative conversions using approximate exchange rates. OpenAI bills exclusively in US Dollars.

Primary source: OpenAI, "API Pricing," openai.com/api/pricing; OpenAI, "Platform Documentation," platform.openai.com/docs. Accessed July 2025.

Creator

Shakeel Muzaffar

Founder & Editor-in-Chief at MultiCalculators ~ Web ~ More Posts

Shakeel Muzaffar is the Founder and Editor-in-Chief of MultiCalculators.com, bringing over 15 years of experience in digital publishing, product strategy, and online tool development. He leads the platform's editorial vision, ensuring every calculator meets strict standards for accuracy, usability, and real-world value. Shakeel personally oversees content quality, formula verification workflows, and the platform's commitment to publishing tools that are genuinely useful for students, professionals, and everyday users worldwide.

Areas of Expertise: Editorial Leadership, Digital Publishing, Product Strategy, Online Calculators, Web Standards

🤖 OpenAI API Token Cost Calculator

💡What Is OpenAI API Pricing?

📊All OpenAI Models & Prices

🔤Understanding Tokens

📐How Costs Are Calculated

⚡Prompt Caching & Batch API

💰Cost Reduction Tips

🔍Source & Calculator Accuracy