GPT-5.4 Mini vs Gemini 3.1 Flash Lite vs Claude Haiku 4.5
These are three current compact models teams are likely to shortlist before they move up to flagship pricing. They overlap enough to compete in the same product conversation, but they diverge in where they spend money: Gemini on cheaper output, Azure's GPT-5.4 route on newer GPT-5 capability, and Claude on Anthropic-specific agent features.
Updated May 31, 2026Compact model comparisonBuilt from site data
At a glance
Three current models, three different cost curves
These figures are based on the site's current model data as checked on May 31, 2026. Extra tool charges, cache writes, grounded requests, and computer-use fees are not included in the sample monthly cost below.
Low-cost multimodal apps that need broad input support, lower output spend, and Google-native retrieval options.
Input
$0.25
Output
$1.50
Context
65,536
Sample monthly
$85.00
VisionAudio inputVideo inputURL context
What to watch: The output price is attractive, but the context window is smaller than GPT-5.4 Mini and the model surface is less conventional than Claude for agent workflows.
Sample workload: 100,000 requests per month, 1,000 input tokens, and 400 output tokens per request.
Model
Input / 1M
Output / 1M
Monthly cost
Read
GPT-5.4 Mini
Azure AI
$0.75
$4.50
$255.00
Newest GPT-5 family option here, with the strongest OpenAI-style tool and reasoning surface.
Gemini 3.1 Flash Lite
Google
$0.25
$1.50
$85.00
Cheapest output cost here, with broad multimodal inputs and URL context.
Claude Haiku 4.5
Anthropic
$1.00
$5.00
$300.00
Most expensive in this workload, but strongest Anthropic-native agent surface.
Cheapest in this scenario
Gemini 3.1 Flash Lite
At $85.00, it stays ahead because its output price is lower than GPT-5.4 Mini and much lower than Claude Haiku 4.5.
Largest context
GPT-5.4 Mini
If your app keeps long threads or large retrieved chunks in one call, the extra headroom matters more than a small delta on input price.
Lowest input cost
Gemini 3.1 Flash Lite
Gemini 3.1 Flash Lite has the lowest input rate here; GPT-5.4 Mini costs more, but buys a newer GPT-5 route and a larger context window.
How to choose
Pick by the pressure in your product
Choose GPT-5.4 Mini
When you want the newer GPT-5.4 line with strong tool calling, prompt caching, web search, image input, reasoning support, and a 128k context window without moving up to a pro-tier route.
Choose Gemini 3.1 Flash Lite
When your app writes a lot, or when multimodal input matters more than raw context size. It is the most economical writer here and the broadest on audio, video, and URL input.
Choose Claude Haiku 4.5
When you specifically want Anthropic's workflow surface, especially prompt caching, computer use, and PDF-heavy agent patterns. It needs that fit because the token bill is higher.
Sources
Where to verify the details
Use the provider docs for final pricing checks. This comparison keeps to the clean shared baseline of input and output token prices, but each provider adds its own rules for caching, tools, and grounded requests.