RAG

Compare cost for retrieval-heavy answers where the model reads user text plus extra document context.

What matters most

RAG needs enough context window for the prompt and retrieved text. Input price often matters more than output price.

This page starts with 100 question tokens, 2,000 retrieved tokens, 500 answer tokens, and 10,000 questions per month.

Only models with enough context size for the example are shown here.

Model	Context window	Input	Monthly cost
Qwen2.5-Coder-7B	32.8K	$0.0100 / 1M tokens	$0.36
llama3.2-11b-vision-instruct	131.1K	$0.0150 / 1M tokens	$0.44
llama3.2-3b-instruct	131.1K	$0.0150 / 1M tokens	$0.44
Llama-3.2-3B-Instruct	131.1K	$0.0200 / 1M tokens	$0.52
paddleocr-vl	16.4K	$0.0200 / 1M tokens	$0.52
Meta-Llama-3.1-8B-Instruct-Turbo	131.1K	$0.0200 / 1M tokens	$0.57
Mistral-Nemo-Instruct-2407	131.1K	$0.0200 / 1M tokens	$0.62
llama-3.1-8b-instruct	16.4K	$0.0200 / 1M tokens	$0.67

Model

User question tokens

Retrieved context tokens

Answer tokens

Questions / month