Back to articles

RAG cost screen

Cheapest LLM API models for RAG answer workloads

Retrieval-augmented generation can become input-heavy because every answer reads the user question plus retrieved context. This page prices one narrow scenario from the RAG use-case page: 100 question tokens, 2,000 retrieved context tokens, 500 answer tokens, and 10,000 questions per month.

Updated June 5, 2026 2.1K input tokens 500 output tokens 10.0K questions / month

Scenario

The workload being priced

This is a cost and context-window screen. It does not decide answer quality, retrieval quality, or grounding behavior.

Question

100 tokens

A short user request.

Retrieved context

2.0K tokens

The source text that drives most input cost.

Answer

500 tokens

The generated response length.

Volume

10.0K / month

A request-style product workload.

Cost screen

Lowest-cost context-ready chat models

Only priced chat models with at least 2.1K context window are shown. Click a model to inspect source and metadata before production use.

Model Context window Input / 1M Output / 1M Monthly cost
Qwen2.5-Coder-7B 32.8K $0.0100 / 1M tokens $0.0300 / 1M tokens $0.36
llama3.2-11b-vision-instruct 131.1K $0.0150 / 1M tokens $0.0250 / 1M tokens $0.44
llama3.2-3b-instruct 131.1K $0.0150 / 1M tokens $0.0250 / 1M tokens $0.44
Llama-3.2-3B-Instruct 131.1K $0.0200 / 1M tokens $0.0200 / 1M tokens $0.52
paddleocr-vl 16.4K $0.0200 / 1M tokens $0.0200 / 1M tokens $0.52
Meta-Llama-3.1-8B-Instruct-Turbo 131.1K $0.0200 / 1M tokens $0.0300 / 1M tokens $0.57
Mistral-Nemo-Instruct-2407 131.1K $0.0200 / 1M tokens $0.0400 / 1M tokens $0.62
llama-3.1-8b-instruct 16.4K $0.0200 / 1M tokens $0.0500 / 1M tokens $0.67

Shortlist

Three models to test first

#1

Qwen2.5-Coder-7B

$0.36 / month

32.8K context window with $0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.

#2

llama3.2-11b-vision-instruct

$0.44 / month

131.1K context window with $0.0150 / 1M tokens input and $0.0250 / 1M tokens output per 1M tokens.

#3

llama3.2-3b-instruct

$0.44 / month

131.1K context window with $0.0150 / 1M tokens input and $0.0250 / 1M tokens output per 1M tokens.

Change retrieval size before choosing a route

RAG winners can change when retrieved context grows from a short passage to a long document bundle. Use the RAG page to change question, context, answer, and monthly volume before comparing routes.

Caveats

What this page does not decide

This page does not rank retrieval quality, answer grounding, citation behavior, latency, safety behavior, rate limits, prompt caching, discounts, regional pricing, or provider-specific add-on charges. Treat it as a cost-first shortlist, then test your own retrieved passages and confirm final pricing with the provider.