Benchmark explorer

Benchmarks

Select one benchmark, then compare base model families within that benchmark only. Scores are source-linked evidence rows, not a universal leaderboard.

154 Benchmarks

1,323 Model routes

13,050 Route rows

This page intentionally avoids cross-benchmark ranking. Pick a benchmark first; the chart and table below only compare rows with that same benchmark label. Rows may come from official model cards, launch posts, papers, or benchmark operators. Benchmark rows use the generated catalog last built on Jul 13, 2026.

SWE-bench Verified MMLU GPQA Diamond Aider Polyglot Mistral 7B comparison table HumanEval Artificial Analysis Coding Index Artificial Analysis Intelligence Index

Benchmark results

Results are grouped by the selected benchmark.

Family	Score	Metric	Category	Scope	Routes	Source
Loading benchmark rows...