Claude Sonnet 4.6 vs Claude Sonnet 5: which wins at real work?

22 task areas · same graded test runs · rank comparison only, so 0–100 and Elo collections never mix raw scores.

Claude Sonnet 4.6 wins 15 of 22 task areas we tested; Claude Sonnet 5 takes 7. Claude Sonnet 5 costs 1.5× less per token ($12 vs $18 per 1M).

Claude Sonnet 4.6

Claude Sonnet 5

Task areas won

Avg percentile

Top-3 finishes

$18.0

Price / 1M tokens

$12.0

Anthropic

Provider

Anthropic

Claude Sonnet 5 costs 1.5× less per token ($12 vs $18 per 1M).

Task by task

Task area	Claude Sonnet 4.6	Claude Sonnet 5	Winner
Summarization & Meeting Notes	#34 / 107 Excellent	#105 / 107 Excellent	Claude Sonnet 4.6
Executive Assistant	#1 / 109 Strong	#66 / 109 Strong	Claude Sonnet 4.6
Creative & Comedy	#18 / 107	#76 / 107	Claude Sonnet 4.6
Legal & HR	#3 / 107 Excellent	#55 / 107 Excellent	Claude Sonnet 4.6
Training & Education	#9 / 107 Excellent	#49 / 107 Excellent	Claude Sonnet 4.6
Frontend & Landing Pages	#20 / 106 Needs editing	#54 / 106 Needs editing	Claude Sonnet 4.6
Structured Output	#98 / 110 Usable	#64 / 110 Strong	Claude Sonnet 5
Investor & Pitch	#2 / 63 Strong	#34 / 63 Usable	Claude Sonnet 4.6
Content & Brand	#38 / 124 Strong	#68 / 124 Strong	Claude Sonnet 4.6
Translation & Localization	#13 / 107 Excellent	#40 / 107 Excellent	Claude Sonnet 4.6
Coding	#25 / 115 Strong	#6 / 115 Excellent	Claude Sonnet 5
Research & Competitive Analysis	#33 / 107 Strong	#15 / 107 Excellent	Claude Sonnet 5
Knowledge & Docs	#34 / 107 Strong	#18 / 107 Strong	Claude Sonnet 5
Presentations & Decks	#33 / 107 Excellent	#20 / 107 Excellent	Claude Sonnet 5
RAG, Safety & Grounding	#65 / 110 Excellent	#77 / 110 Excellent	Claude Sonnet 4.6
Product & Project Management	#48 / 107 Strong	#59 / 107 Strong	Claude Sonnet 4.6
Data & Analytics	#6 / 110 Excellent	#15 / 110 Excellent	Claude Sonnet 4.6
Landing Pages	#14 / 69 Strong	#23 / 69 Strong	Claude Sonnet 4.6
Chef / Home Cooking	#19 / 126 Usable	#27 / 126 Usable	Claude Sonnet 4.6
Customer Support	#57 / 113 Strong	#52 / 113 Strong	Claude Sonnet 5
Sales	#13 / 107 Strong	#15 / 107 Strong	Claude Sonnet 4.6
AI Strategy	#2 / 126 Strong	#1 / 126 Strong	Claude Sonnet 5

Rank = position among every model config we tested in that task area (lower is better). Sorted by biggest gap first.

Frequently asked

Is Claude Sonnet 4.6 better than Claude Sonnet 5?

Across 22 task areas we benchmarked, Claude Sonnet 4.6 ranks higher in 15 and Claude Sonnet 5 in 7.

Which is cheaper, Claude Sonnet 4.6 or Claude Sonnet 5?

Claude Sonnet 5 costs 1.5× less per token ($12 vs $18 per 1M).

What is Claude Sonnet 4.6 better at?

Claude Sonnet 4.6 out-ranks Claude Sonnet 5 at Summarization & Meeting Notes, Executive Assistant, Creative & Comedy.

What is Claude Sonnet 5 better at?

Claude Sonnet 5 out-ranks Claude Sonnet 4.6 at Structured Output, Coding, Research & Competitive Analysis.

Full Claude Sonnet 4.6 review → Full Claude Sonnet 5 review → Full model leaderboard →

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

Generate test cases from your prompt — no eval set required to start.
Compare models side by side with quality, cost and latency in one matrix.
Optimise the winner until the scores say it's ready to ship.

Join the waitlist Browse all benchmarks

Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals

Claude Opus

GPT-5

Gemini

7.1

6.8

7.4

8.3

7.9

8.0

9.2 ★

8.6

8.4

Best combo: v3 × Claude Opus

9.2 quality · $0.004/run · 1.8s