Claude Sonnet 4.6 vs Claude Sonnet 5: which wins at real work?
22 task areas · same graded test runs · rank comparison only, so 0–100 and Elo collections never mix raw scores.
Claude Sonnet 4.6 wins 15 of 22 task areas we tested; Claude Sonnet 5 takes 7. Claude Sonnet 5 costs 1.5× less per token ($12 vs $18 per 1M).
Claude Sonnet 5 costs 1.5× less per token ($12 vs $18 per 1M).
Task by task
| Task area | Claude Sonnet 4.6 | Claude Sonnet 5 | Winner |
|---|---|---|---|
| Summarization & Meeting Notes |
#34
/ 107
Excellent
|
#105
/ 107
Excellent
|
Claude Sonnet 4.6 |
| Executive Assistant |
#1
/ 109
Strong
|
#66
/ 109
Strong
|
Claude Sonnet 4.6 |
| Creative & Comedy | #18 / 107 | #76 / 107 | Claude Sonnet 4.6 |
| Legal & HR |
#3
/ 107
Excellent
|
#55
/ 107
Excellent
|
Claude Sonnet 4.6 |
| Training & Education |
#9
/ 107
Excellent
|
#49
/ 107
Excellent
|
Claude Sonnet 4.6 |
| Frontend & Landing Pages |
#20
/ 106
Needs editing
|
#54
/ 106
Needs editing
|
Claude Sonnet 4.6 |
| Structured Output |
#98
/ 110
Usable
|
#64
/ 110
Strong
|
Claude Sonnet 5 |
| Investor & Pitch |
#2
/ 63
Strong
|
#34
/ 63
Usable
|
Claude Sonnet 4.6 |
| Content & Brand |
#38
/ 124
Strong
|
#68
/ 124
Strong
|
Claude Sonnet 4.6 |
| Translation & Localization |
#13
/ 107
Excellent
|
#40
/ 107
Excellent
|
Claude Sonnet 4.6 |
| Coding |
#25
/ 115
Strong
|
#6
/ 115
Excellent
|
Claude Sonnet 5 |
| Research & Competitive Analysis |
#33
/ 107
Strong
|
#15
/ 107
Excellent
|
Claude Sonnet 5 |
| Knowledge & Docs |
#34
/ 107
Strong
|
#18
/ 107
Strong
|
Claude Sonnet 5 |
| Presentations & Decks |
#33
/ 107
Excellent
|
#20
/ 107
Excellent
|
Claude Sonnet 5 |
| RAG, Safety & Grounding |
#65
/ 110
Excellent
|
#77
/ 110
Excellent
|
Claude Sonnet 4.6 |
| Product & Project Management |
#48
/ 107
Strong
|
#59
/ 107
Strong
|
Claude Sonnet 4.6 |
| Data & Analytics |
#6
/ 110
Excellent
|
#15
/ 110
Excellent
|
Claude Sonnet 4.6 |
| Landing Pages |
#14
/ 69
Strong
|
#23
/ 69
Strong
|
Claude Sonnet 4.6 |
| Chef / Home Cooking |
#19
/ 126
Usable
|
#27
/ 126
Usable
|
Claude Sonnet 4.6 |
| Customer Support |
#57
/ 113
Strong
|
#52
/ 113
Strong
|
Claude Sonnet 5 |
| Sales |
#13
/ 107
Strong
|
#15
/ 107
Strong
|
Claude Sonnet 4.6 |
| AI Strategy |
#2
/ 126
Strong
|
#1
/ 126
Strong
|
Claude Sonnet 5 |
Rank = position among every model config we tested in that task area (lower is better). Sorted by biggest gap first.
Frequently asked
Is Claude Sonnet 4.6 better than Claude Sonnet 5?
Across 22 task areas we benchmarked, Claude Sonnet 4.6 ranks higher in 15 and Claude Sonnet 5 in 7.
Which is cheaper, Claude Sonnet 4.6 or Claude Sonnet 5?
Claude Sonnet 5 costs 1.5× less per token ($12 vs $18 per 1M).
What is Claude Sonnet 4.6 better at?
Claude Sonnet 4.6 out-ranks Claude Sonnet 5 at Summarization & Meeting Notes, Executive Assistant, Creative & Comedy.
What is Claude Sonnet 5 better at?
Claude Sonnet 5 out-ranks Claude Sonnet 4.6 at Structured Output, Coding, Research & Competitive Analysis.
This page is Spring Prompt, running
We just did this for every model. Do it for your prompt.
The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.
- Generate test cases from your prompt — no eval set required to start.
- Compare models side by side with quality, cost and latency in one matrix.
- Optimise the winner until the scores say it's ready to ship.
Prompt × model results
12 test cases · 3 evals