Back to articles

Summarization cost screen

Cheapest LLM API models for document summarization workloads

Summarization workloads are usually input-heavy: the model reads a long document and writes a short summary. This page prices one narrow scenario from the summarization use-case page: a 10,000-token document, a 500-token summary, and 1,000 documents per month.

Updated June 5, 2026 10.0K input tokens 500 output tokens 1.0K documents / month

Scenario

The workload being priced

This is a cost screen, not a quality benchmark. It is meant to help you find models worth testing in the calculator and compare pages.

Document

10.0K input tokens

Long input price dominates the monthly bill.

Summary

500 output tokens

Output price still matters, but less than the document read.

Volume

1.0K documents / month

Multiply per-document cost by a modest monthly batch.

Cost screen

Lowest-cost priced chat models for this scenario

Prices come from the checked-in model database. Click a model to inspect its source and metadata before production use.

Model Input / 1M Output / 1M Monthly cost
Qwen2.5-Coder-3B-Instruct $0.0100 / 1M tokens $0.0300 / 1M tokens $0.12
Qwen2.5-Coder-7B-Instruct $0.0100 / 1M tokens $0.0300 / 1M tokens $0.12
Qwen2.5-Coder-7B $0.0100 / 1M tokens $0.0300 / 1M tokens $0.12
llama3.2-11b-vision-instruct $0.0150 / 1M tokens $0.0250 / 1M tokens $0.16
llama3.2-3b-instruct $0.0150 / 1M tokens $0.0250 / 1M tokens $0.16
Llama-3.2-3B-Instruct $0.0200 / 1M tokens $0.0200 / 1M tokens $0.21
paddleocr-vl $0.0200 / 1M tokens $0.0200 / 1M tokens $0.21
Meta-Llama-3.1-8B-Instruct-Turbo $0.0200 / 1M tokens $0.0300 / 1M tokens $0.22

Shortlist

Three models to test first

#3

Qwen2.5-Coder-7B

$0.12 / month

$0.0100 / 1M tokens input and $0.0300 / 1M tokens output per 1M tokens.

Change document length before choosing a route

A 2,000-token note and a 50,000-token transcript can produce very different winners. Use the summarization page to change token counts, then compare the shortlisted models side by side.

Caveats

What this page does not decide

This page does not rank summary quality, factuality, citation behavior, latency, rate limits, prompt caching, discounts, regional pricing, or provider-specific add-on charges. Treat it as a cost-first shortlist, then run your own documents through candidate models and confirm final pricing with the provider.