Summarization cost screen

Cheapest LLM API models for document summarization workloads

Summarization workloads are usually input-heavy: the model reads a long document and writes a short summary. This page prices one narrow scenario from the summarization use-case page: a 10,000-token document, a 500-token summary, and 1,000 documents per month.

Updated June 5, 2026 10.0K input tokens 500 output tokens 1.0K documents / month

Scenario

The workload being priced

This is a cost screen, not a quality benchmark. It is meant to help you find models worth testing in the calculator and compare pages.

Document

10.0K input tokens

Long input price dominates the monthly bill.

Summary

500 output tokens

Output price still matters, but less than the document read.

Volume

1.0K documents / month

Multiply per-document cost by a modest monthly batch.

Cost screen

Lowest-cost priced chat models for this scenario

Prices come from the current model database. Click a model to inspect its source and metadata before production use.

Model	Input / 1M	Output / 1M	Monthly cost
Qwen2.5-Coder-3B-Instruct	$0.0100 / 1M tokens	$0.0300 / 1M tokens	$0.12
Qwen2.5-Coder-7B-Instruct	$0.0100 / 1M tokens	$0.0300 / 1M tokens	$0.12
Qwen2.5-Coder-7B	$0.0100 / 1M tokens	$0.0300 / 1M tokens	$0.12
llama3.2-11b-vision-instruct	$0.0150 / 1M tokens	$0.0250 / 1M tokens	$0.16
llama3.2-3b-instruct	$0.0150 / 1M tokens	$0.0250 / 1M tokens	$0.16
gpt-oss-20b	$0.0145 / 1M tokens	$0.0700 / 1M tokens	$0.18
Llama-3.2-3B-Instruct	$0.0200 / 1M tokens	$0.0200 / 1M tokens	$0.21
paddleocr-vl	$0.0200 / 1M tokens	$0.0200 / 1M tokens	$0.21

Shortlist

Three models to test first

Qwen2.5-Coder-3B-Instruct

$0.12 / month

$0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.

Qwen2.5-Coder-7B-Instruct

$0.12 / month

$0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.

Qwen2.5-Coder-7B

$0.12 / month

$0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.

Change document length before choosing a route

A 2,000-token note and a 50,000-token transcript can produce very different winners. Use the summarization page to change token counts, then compare the shortlisted models side by side.

Open summarization calculator Compare top three

Caveats

What this page does not decide

This page does not rank summary quality, factuality, citation behavior, latency, rate limits, prompt caching, discounts, regional pricing, or provider-specific add-on charges. Treat it as a cost-first shortlist, then run your own documents through candidate models and confirm final pricing with the provider.

Cheapest LLM API models for document summarization workloads

The workload being priced

10.0K input tokens

500 output tokens

1.0K documents / month

Lowest-cost priced chat models for this scenario

Three models to test first

Qwen2.5-Coder-3B-Instruct

Qwen2.5-Coder-7B-Instruct

Qwen2.5-Coder-7B

Compare adjacent workload guides

500-token chatbot workload

RAG answer workload

7k-input coding-agent workload

Change document length before choosing a route

What this page does not decide