Summarization
Compare cost for long inputs and short outputs. This is useful for reports, articles, notes, and other text-heavy work.
What matters most
Summarization is usually input-heavy. Low input price matters a lot, while output price matters less if the summary stays short.
Base example
This page starts with a 10,000-token document, a 500-token summary, and 1,000 documents per month.
Low-cost summarization models in the base example
| Model | Input | Output | Monthly cost |
|---|---|---|---|
| Qwen2.5-Coder-3B-Instruct | $0.0100 / 1M tokens | $0.0300 / 1M tokens | $0.12 |
| Qwen2.5-Coder-7B-Instruct | $0.0100 / 1M tokens | $0.0300 / 1M tokens | $0.12 |
| Qwen2.5-Coder-7B | $0.0100 / 1M tokens | $0.0300 / 1M tokens | $0.12 |
| llama3.2-11b-vision-instruct | $0.0150 / 1M tokens | $0.0250 / 1M tokens | $0.16 |
| llama3.2-3b-instruct | $0.0150 / 1M tokens | $0.0250 / 1M tokens | $0.16 |
| titan-embed-text-v2 | $0.0200 / 1M tokens | N/A | $0.20 |
| Llama-3.2-3B-Instruct | $0.0200 / 1M tokens | $0.0200 / 1M tokens | $0.21 |
| paddleocr-vl | $0.0200 / 1M tokens | $0.0200 / 1M tokens | $0.21 |