← Home

Summarization

Compare cost for long inputs and short outputs. This is useful for reports, articles, notes, and other text-heavy work.

What matters most

Summarization is usually input-heavy. Low input price matters a lot, while output price matters less if the summary stays short.

Base example

This page starts with a 10,000-token document, a 500-token summary, and 1,000 documents per month.

Low-cost summarization models in the base example

Model Input Output Monthly cost
Qwen2.5-Coder-3B-Instruct $0.0100 / 1M tokens $0.0300 / 1M tokens $0.12
Qwen2.5-Coder-7B-Instruct $0.0100 / 1M tokens $0.0300 / 1M tokens $0.12
Qwen2.5-Coder-7B $0.0100 / 1M tokens $0.0300 / 1M tokens $0.12
llama3.2-11b-vision-instruct $0.0150 / 1M tokens $0.0250 / 1M tokens $0.16
llama3.2-3b-instruct $0.0150 / 1M tokens $0.0250 / 1M tokens $0.16
titan-embed-text-v2 $0.0200 / 1M tokens N/A $0.20
Llama-3.2-3B-Instruct $0.0200 / 1M tokens $0.0200 / 1M tokens $0.21
paddleocr-vl $0.0200 / 1M tokens $0.0200 / 1M tokens $0.21