Qwen2.5-Coder-7B
$0.36 / month
32.8K context window with $0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.
RAG cost screen
Retrieval-augmented generation can become input-heavy because every answer reads the user question plus retrieved context. This page prices one narrow scenario from the RAG use-case page: 100 question tokens, 2,000 retrieved context tokens, 500 answer tokens, and 10,000 questions per month.
Scenario
This is a cost and context-window screen. It does not decide answer quality, retrieval quality, or grounding behavior.
Question
A short user request.
Retrieved context
The source text that drives most input cost.
Answer
The generated response length.
Volume
A request-style product workload.
Cost screen
Only priced chat models with at least 2.1K context window are shown. Click a model to inspect source and metadata before production use.
| Model | Context window | Input / 1M | Output / 1M | Monthly cost |
|---|---|---|---|---|
| Qwen2.5-Coder-7B | 32.8K | $0.0100 / 1M tokens | $0.0300 / 1M tokens | $0.36 |
| llama3.2-11b-vision-instruct | 131.1K | $0.0150 / 1M tokens | $0.0250 / 1M tokens | $0.44 |
| llama3.2-3b-instruct | 131.1K | $0.0150 / 1M tokens | $0.0250 / 1M tokens | $0.44 |
| Llama-3.2-3B-Instruct | 131.1K | $0.0200 / 1M tokens | $0.0200 / 1M tokens | $0.52 |
| paddleocr-vl | 16.4K | $0.0200 / 1M tokens | $0.0200 / 1M tokens | $0.52 |
| Meta-Llama-3.1-8B-Instruct-Turbo | 131.1K | $0.0200 / 1M tokens | $0.0300 / 1M tokens | $0.57 |
| Mistral-Nemo-Instruct-2407 | 131.1K | $0.0200 / 1M tokens | $0.0400 / 1M tokens | $0.62 |
| llama-3.1-8b-instruct | 16.4K | $0.0200 / 1M tokens | $0.0500 / 1M tokens | $0.67 |
Shortlist
$0.36 / month
32.8K context window with $0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.
$0.44 / month
131.1K context window with $0.0150 / 1M tokens input and $0.0250 / 1M tokens output per 1M tokens.
$0.44 / month
131.1K context window with $0.0150 / 1M tokens input and $0.0250 / 1M tokens output per 1M tokens.
RAG winners can change when retrieved context grows from a short passage to a long document bundle. Use the RAG page to change question, context, answer, and monthly volume before comparing routes.
Caveats
This page does not rank retrieval quality, answer grounding, citation behavior, latency, safety behavior, rate limits, prompt caching, discounts, regional pricing, or provider-specific add-on charges. Treat it as a cost-first shortlist, then test your own retrieved passages and confirm final pricing with the provider.