RAG cost screen

Cheapest LLM API models for RAG answer workloads

Retrieval-augmented generation can become input-heavy because every answer reads the user question plus retrieved context. This page prices one narrow scenario from the RAG use-case page: 100 question tokens, 2,000 retrieved context tokens, 500 answer tokens, and 10,000 questions per month.

Updated June 5, 2026 2.1K input tokens 500 output tokens 10.0K questions / month

Scenario

The workload being priced

This is a cost and context-window screen. It does not decide answer quality, retrieval quality, or grounding behavior.

Question

100 tokens

A short user request.

Retrieved context

2.0K tokens

The source text that drives most input cost.

Answer

500 tokens

The generated response length.

Volume

10.0K / month

A request-style product workload.

Cost screen

Lowest-cost context-ready chat models

Only priced chat models with at least 2.1K context window are shown. Click a model to inspect source and metadata before production use.

Model	Context window	Input / 1M	Output / 1M	Monthly cost
Qwen2.5-Coder-7B	32.8K	$0.0100 / 1M tokens	$0.0300 / 1M tokens	$0.36
llama3.2-11b-vision-instruct	131.1K	$0.0150 / 1M tokens	$0.0250 / 1M tokens	$0.44
llama3.2-3b-instruct	131.1K	$0.0150 / 1M tokens	$0.0250 / 1M tokens	$0.44
Llama-3.2-3B-Instruct	131.1K	$0.0200 / 1M tokens	$0.0200 / 1M tokens	$0.52
paddleocr-vl	16.4K	$0.0200 / 1M tokens	$0.0200 / 1M tokens	$0.52
Meta-Llama-3.1-8B-Instruct-Turbo	131.1K	$0.0200 / 1M tokens	$0.0300 / 1M tokens	$0.57
Mistral-Nemo-Instruct-2407	131.1K	$0.0200 / 1M tokens	$0.0400 / 1M tokens	$0.62
gpt-oss-20b	32.8K	$0.0145 / 1M tokens	$0.0700 / 1M tokens	$0.65

Shortlist

Three models to test first

Qwen2.5-Coder-7B

$0.36 / month

32.8K context window with $0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.

llama3.2-11b-vision-instruct

$0.44 / month

131.1K context window with $0.0150 / 1M tokens input and $0.0250 / 1M tokens output per 1M tokens.

llama3.2-3b-instruct

$0.44 / month

131.1K context window with $0.0150 / 1M tokens input and $0.0250 / 1M tokens output per 1M tokens.

Change retrieval size before choosing a route

RAG winners can change when retrieved context grows from a short passage to a long document bundle. Use the RAG page to change question, context, answer, and monthly volume before comparing routes.

Open RAG calculator Compare top three

Caveats

What this page does not decide

This page does not rank retrieval quality, answer grounding, citation behavior, latency, safety behavior, rate limits, prompt caching, discounts, regional pricing, or provider-specific add-on charges. Treat it as a cost-first shortlist, then test your own retrieved passages and confirm final pricing with the provider.