Confirm Action

Are you sure you want to proceed?

Are AI models getting better over time?

We plotted every model we've tested by its release date and how it scores on real business tasks. Newest releases sit on the right.

35
Models dated
Kimi K2.7 Code
Newest · 17 days ago
GPT-4o
Oldest · 2.1 years ago

Every model, by age

Model Released ▾
Kimi K2.7 CodeMoonshotAI 17 days ago(2026-06-12)
Claude Opus 4.8Anthropic 1 month ago(2026-05-27)
Qwen3.7 MaxQwen 1 month ago(2026-05-21)
Gemini 3.5 FlashGoogle 1 month ago(2026-05-19)
Gemini 3.1 Flash LiteGoogle 2 months ago(2026-05-07)
GPT-5.5OpenAI 2 months ago(2026-04-24)
GPT-5.5 ProOpenAI 2 months ago(2026-04-24)
Claude Opus 4.7Anthropic 2 months ago(2026-04-16)
GLM 5.1Z.ai 3 months ago(2026-04-07)
Grok 4.20xAI 3 months ago(2026-03-31)
MiniMax M2.7MiniMax 3 months ago(2026-03-18)
GPT-5.4 MiniOpenAI 3 months ago(2026-03-17)
GPT-5.4 NanoOpenAI 3 months ago(2026-03-17)
GPT-5.4OpenAI 4 months ago(2026-03-05)
Gemini 3.1 Flash Lite PreviewGoogle 4 months ago(2026-03-03)
Gemini 3.1 Pro PreviewGoogle 4 months ago(2026-02-19)
Claude Sonnet 4.6Anthropic 4 months ago(2026-02-17)
Qwen3.5 Plus 2026-02-15Qwen 4 months ago(2026-02-16)
GLM 5Z.ai 5 months ago(2026-02-11)
Claude Opus 4.6Anthropic 5 months ago(2026-02-04)
Kimi K2.5MoonshotAI 5 months ago(2026-01-27)
Gemini 3 Flash PreviewGoogle 6 months ago(2025-12-17)
DeepSeek V3.2DeepSeek 7 months ago(2025-12-01)
Claude Opus 4.5Anthropic 7 months ago(2025-11-24)
Claude Haiku 4.5Anthropic 8 months ago(2025-10-15)
Claude Sonnet 4.5Anthropic 9 months ago(2025-09-29)
DeepSeek V3.1 TerminusDeepSeek 9 months ago(2025-09-22)
Qwen3 Coder FlashQwen 9 months ago(2025-09-17)
Mistral Medium 3.1Mistral 11 months ago(2025-08-13)
GPT-5 MiniOpenAI 11 months ago(2025-08-07)
Gemini 2.5 ProGoogle 1.0 years ago(2025-06-17)
Gemini 2.5 FlashGoogle 1.0 years ago(2025-06-17)
o1OpenAI 1.5 years ago(2024-12-17)
GPT-4o MiniOpenAI 1.9 years ago(2024-07-18)
GPT-4oOpenAI 2.1 years ago(2024-05-13)
Grok 4.20 BetaxAI

Release dates via OpenRouter (model availability date). “Our standing” is each model's average percentile across the task areas we test.