LLM Pricing
← Back to articles

Data-led article

GPT-5.4 Mini vs Gemini 3.1 Flash Lite vs Claude Haiku 4.5

These are three current compact models teams are likely to shortlist before they move up to flagship pricing. They overlap enough to compete in the same product conversation, but they diverge in where they spend money: Gemini on cheaper output, Azure's GPT-5.4 route on newer GPT-5 capability, and Claude on Anthropic-specific agent features.

Updated May 31, 2026 Compact model comparison Built from site data

At a glance

Three current models, three different cost curves

These figures are based on the site's current model data as checked on May 31, 2026. Extra tool charges, cache writes, grounded requests, and computer-use fees are not included in the sample monthly cost below.

Azure AI

GPT-5.4 Mini

Model details

A newer GPT-5 line option when you want strong tool support, reasoning, prompt caching, and a full 128k context window.

Input
$0.75
Output
$4.50
Context
128,000
Sample monthly
$255.00
VisionFunction callingPrompt cachingReasoningWeb search

What to watch: It is more expensive than Gemini and close to Claude in this write-heavy sample, so the GPT-5.4 capability surface needs to matter.

Google

Gemini 3.1 Flash Lite

Model details

Low-cost multimodal apps that need broad input support, lower output spend, and Google-native retrieval options.

Input
$0.25
Output
$1.50
Context
65,536
Sample monthly
$85.00
VisionAudio inputVideo inputURL context

What to watch: The output price is attractive, but the context window is smaller than GPT-5.4 Mini and the model surface is less conventional than Claude for agent workflows.

Anthropic

Claude Haiku 4.5

Model details

Teams already leaning on Anthropic workflows such as prompt caching, computer use, and PDF-heavy assistants.

Input
$1.00
Output
$5.00
Context
64,000
Sample monthly
$300.00
VisionPrompt cachingComputer usePDF input

What to watch: It is materially more expensive than the other two in a write-heavy app, so it needs feature fit to justify the bill.

Open the live comparison

Jump into the compare tool with these three models already selected, or drill into the detail page for a single model.

Scenario

What the same workload costs

Sample workload: 100,000 requests per month, 1,000 input tokens, and 400 output tokens per request.

Model Input / 1M Output / 1M Monthly cost Read
GPT-5.4 Mini
Azure AI
$0.75 $4.50 $255.00 Newest GPT-5 family option here, with the strongest OpenAI-style tool and reasoning surface.
Gemini 3.1 Flash Lite
Google
$0.25 $1.50 $85.00 Cheapest output cost here, with broad multimodal inputs and URL context.
Claude Haiku 4.5
Anthropic
$1.00 $5.00 $300.00 Most expensive in this workload, but strongest Anthropic-native agent surface.
Cheapest in this scenario
Gemini 3.1 Flash Lite

At $85.00, it stays ahead because its output price is lower than GPT-5.4 Mini and much lower than Claude Haiku 4.5.

Largest context
GPT-5.4 Mini

If your app keeps long threads or large retrieved chunks in one call, the extra headroom matters more than a small delta on input price.

Lowest input cost
Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite has the lowest input rate here; GPT-5.4 Mini costs more, but buys a newer GPT-5 route and a larger context window.

How to choose

Pick by the pressure in your product

Choose GPT-5.4 Mini

When you want the newer GPT-5.4 line with strong tool calling, prompt caching, web search, image input, reasoning support, and a 128k context window without moving up to a pro-tier route.

Choose Gemini 3.1 Flash Lite

When your app writes a lot, or when multimodal input matters more than raw context size. It is the most economical writer here and the broadest on audio, video, and URL input.

Choose Claude Haiku 4.5

When you specifically want Anthropic's workflow surface, especially prompt caching, computer use, and PDF-heavy agent patterns. It needs that fit because the token bill is higher.

Sources

Where to verify the details

Use the provider docs for final pricing checks. This comparison keeps to the clean shared baseline of input and output token prices, but each provider adds its own rules for caching, tools, and grounded requests.