nscale
Qwen2.5-Coder-3B-Instruct
In this workload, the estimated monthly API cost is $1.40. The route's listed context window is N/A.
Open model detailsData-led article
A support chatbot has a different cost shape from a coding agent or a long-context RAG answer. This screen prices one narrow workload: 500 input tokens, 300 output tokens, 100 messages per user, and 1,000 monthly users.
Scenario
The numbers are generated from the site's current checked-in data. Provider bills can differ because of cache behavior, discounts, regions, tiers, or provider-specific billing rules.
A short user message plus recent chat context.
A concise support or product answer.
A repeated monthly support workload.
100,000 total messages.
Cost screen
Rows are sorted by estimated monthly API cost. Open each model page before treating any route as production-ready.
| Model | Context | Input / 1M | Output / 1M | Monthly cost |
|---|---|---|---|---|
| Qwen2.5-Coder-3B-Instruct nscale | N/A | $0.0100 / 1M tokens | $0.0300 / 1M tokens | $1.40 |
| Qwen2.5-Coder-7B-Instruct nscale | N/A | $0.0100 / 1M tokens | $0.0300 / 1M tokens | $1.40 |
| Qwen2.5-Coder-7B nebius | 32.8K | $0.0100 / 1M tokens | $0.0300 / 1M tokens | $1.40 |
| llama3.2-11b-vision-instruct lambda_ai | 131.1K | $0.0150 / 1M tokens | $0.0250 / 1M tokens | $1.50 |
| llama3.2-3b-instruct lambda_ai | 131.1K | $0.0150 / 1M tokens | $0.0250 / 1M tokens | $1.50 |
| Llama-3.2-3B-Instruct deepinfra | 131.1K | $0.0200 / 1M tokens | $0.0200 / 1M tokens | $1.60 |
| paddleocr-vl novita | 16.4K | $0.0200 / 1M tokens | $0.0200 / 1M tokens | $1.60 |
| Meta-Llama-3.1-8B-Instruct-Turbo deepinfra | 131.1K | $0.0200 / 1M tokens | $0.0300 / 1M tokens | $1.90 |
nscale
In this workload, the estimated monthly API cost is $1.40. The route's listed context window is N/A.
Open model detailsnscale
In this workload, the estimated monthly API cost is $1.40. The route's listed context window is N/A.
Open model detailsnebius
In this workload, the estimated monthly API cost is $1.40. The route's listed context window is 32.8K.
Open model detailsIf your chatbot sends longer context, longer answers, or fewer repeated sessions, rerun the calculator with your own token counts.
Caveats
This page does not rank answer quality, latency, safety behavior, tool calling, multilingual coverage, or rate limits. Some low-cost chat routes may be specialized, gated, or inappropriate for a general support chatbot. Use this as a pricing shortlist, then test the exact model route and verify final pricing with the provider.