OpenAI
Check the pricing page for cached-input, Batch API, priority, and flex rows, then use the prompt caching guide to confirm eligibility and cached-token reporting.
Pricing guide
Input and output token prices are still the baseline. But repeated context, delayed jobs, and optional processing tiers can move the real cost check to cached input, cache write, batch, priority, or flex rows.
Decision path
A lower headline token price is not always the cheapest production shape. First decide whether the workload is interactive, repeated, offline, or tied to a processing tier.
| Workload pattern | Pricing rows to inspect | Best next step |
|---|---|---|
| One-at-a-time product requests | Input and output | Estimate the monthly request shape in the calculator. |
| Repeated instructions, documents, or system context | Cached input and cache write | Start from the cache reuse preset, then change the hit rate. |
| Offline analysis or large jobs that can wait | Batch input and batch output | Use compare to check which models expose batch rows. |
| Latency or capacity tier decisions | Priority and flex input/output | Compare the optional processing rows beside standard prices. |
Calculator
The calculator includes a cache reuse preset with 8k input tokens, 1k output tokens, 5k monthly requests, and a 50% cache hit rate. Treat that as a starting point, not a promise from any provider.
If a selected model exposes cached-input pricing, the calculator uses that row for the cached share. If it does not, the estimate falls back to ordinary input pricing so the scenario still has a conservative first-pass cost.
Cache check
Compare
The compare page shows more pricing dimensions than a single monthly calculator scenario can express. Use it to inspect cached input, cache write, batch input, batch output, priority input/output, and flex input/output beside ordinary token prices.
This is especially useful when two models look similar on standard input/output cost, but one route exposes a cheaper asynchronous or cache-aware path for the workload.
Do not skip
Batch and cache rows do not replace provider documentation. Confirm eligibility, rate limits, retention, latency, and billing behavior before treating a route as production-ready.
Provider docs
Check the pricing page for cached-input, Batch API, priority, and flex rows, then use the prompt caching guide to confirm eligibility and cached-token reporting.
Check cache write, cache hit, batch, and long-context notes before treating a Claude workload as a simple input/output estimate.
Check context caching token prices, storage prices, and cache TTL behavior before estimating repeated-context workloads.
Limits
This guide does not rank model quality, latency, quotas, regional availability, tool charges, account discounts, or final invoices. It also does not guarantee cache eligibility for any prompt. Use it to choose the right pricing rows, then verify the production contract with the provider.