Cheapest LLM API Chat Models for a 500-Token Chatbot

Scenario

The chatbot workload being priced

The numbers are generated from the site's current pricing data. Provider bills can differ because of cache behavior, discounts, regions, tiers, or provider-specific billing rules.

Input / message

500 tokens

A short user message plus recent chat context.

Output / message

300 tokens

A concise support or product answer.

Messages / user

100

A repeated monthly support workload.

Monthly users

1,000

100,000 total messages.

Cost screen

Lowest-cost priced chat routes in this workload

Rows are sorted by estimated monthly API cost. Open each model page before treating any route as production-ready.

Model	Context	Input / 1M	Output / 1M	Monthly cost
Qwen2.5-Coder-3B-Instruct nscale	N/A	$0.0100 / 1M tokens	$0.0300 / 1M tokens	$1.40
Qwen2.5-Coder-7B-Instruct nscale	N/A	$0.0100 / 1M tokens	$0.0300 / 1M tokens	$1.40
Qwen2.5-Coder-7B nebius	32.8K	$0.0100 / 1M tokens	$0.0300 / 1M tokens	$1.40
llama3.2-11b-vision-instruct lambda_ai	131.1K	$0.0150 / 1M tokens	$0.0250 / 1M tokens	$1.50
llama3.2-3b-instruct lambda_ai	131.1K	$0.0150 / 1M tokens	$0.0250 / 1M tokens	$1.50
Llama-3.2-3B-Instruct deepinfra	131.1K	$0.0200 / 1M tokens	$0.0200 / 1M tokens	$1.60
paddleocr-vl novita	16.4K	$0.0200 / 1M tokens	$0.0200 / 1M tokens	$1.60
Meta-Llama-3.1-8B-Instruct-Turbo deepinfra	131.1K	$0.0200 / 1M tokens	$0.0300 / 1M tokens	$1.90

nscale

Qwen2.5-Coder-3B-Instruct

In this workload, the estimated monthly API cost is $1.40. The route's listed context window is N/A.

Open model details

nscale

Qwen2.5-Coder-7B-Instruct

In this workload, the estimated monthly API cost is $1.40. The route's listed context window is N/A.

Open model details

nebius

Qwen2.5-Coder-7B

In this workload, the estimated monthly API cost is $1.40. The route's listed context window is 32.8K.

Open model details

Related workloads

Compare adjacent workload guides

Chatbot costs are short-context and repeated. These guides show how the model shortlist changes when context, document length, or agent turns dominate.

Change the workload before choosing a route

If your chatbot sends longer context, longer answers, or fewer repeated sessions, rerun the calculator with your own token counts.

Open chatbot calculator Compare top three

Caveats

What this comparison does not prove

This page does not rank answer quality, latency, safety behavior, tool calling, multilingual coverage, or rate limits. Some low-cost chat routes may be specialized, gated, or inappropriate for a general support chatbot. Use this as a pricing shortlist, then test the exact model route and verify final pricing with the provider.

Cheapest LLM API chat models for a 500-token chatbot workload

The chatbot workload being priced

Lowest-cost priced chat routes in this workload

Qwen2.5-Coder-3B-Instruct

Qwen2.5-Coder-7B-Instruct

Qwen2.5-Coder-7B

Compare adjacent workload guides

RAG answer workload

10k-token document summary

7k-input coding-agent workload

Change the workload before choosing a route

What this comparison does not prove