← Home

RAG

Compare cost for retrieval-heavy answers where the model reads user text plus extra document context.

What matters most

RAG needs enough context window for the prompt and retrieved text. Input price often matters more than output price.

Base example

This page starts with 100 question tokens, 2,000 retrieved tokens, 500 answer tokens, and 10,000 questions per month.

Low-cost RAG models in the base example

Only models with enough context size for the example are shown here.

Model Context window Input Monthly cost
Qwen2.5-Coder-7B 32.8K $0.0100 / 1M tokens $0.36
llama3.2-11b-vision-instruct 131.1K $0.0150 / 1M tokens $0.44
llama3.2-3b-instruct 131.1K $0.0150 / 1M tokens $0.44
Llama-3.2-3B-Instruct 131.1K $0.0200 / 1M tokens $0.52
paddleocr-vl 16.4K $0.0200 / 1M tokens $0.52
Meta-Llama-3.1-8B-Instruct-Turbo 131.1K $0.0200 / 1M tokens $0.57
Mistral-Nemo-Instruct-2407 131.1K $0.0200 / 1M tokens $0.62
llama-3.1-8b-instruct 16.4K $0.0200 / 1M tokens $0.67