Claude Opus 4.8 vs Gemini 3.1 Pro Preview: which wins at real work?
22 task areas · same graded test runs · rank comparison only, so 0–100 and Elo collections never mix raw scores.
Claude Opus 4.8 and Gemini 3.1 Pro Preview split the 22 task areas we tested 11–11. Gemini 3.1 Pro Preview costs 2.1× less per token ($14 vs $30 per 1M).
Gemini 3.1 Pro Preview costs 2.1× less per token ($14 vs $30 per 1M).
Task by task
| Task area | Claude Opus 4.8 | Gemini 3.1 Pro Preview | Winner |
|---|---|---|---|
| Coding |
#3
/ 115
Excellent
|
#55
/ 115
Strong
|
Claude Opus 4.8 |
| Structured Output |
#59
/ 113
Excellent
|
#15
/ 113
Excellent
|
Gemini 3.1 Pro Preview |
| Research & Competitive Analysis |
#5
/ 110
Excellent
|
#42
/ 110
Strong
|
Claude Opus 4.8 |
| Data & Analytics |
#2
/ 110
Excellent
|
#37
/ 110
Excellent
|
Claude Opus 4.8 |
| Sales |
#1
/ 110
Strong
|
#32
/ 110
Strong
|
Claude Opus 4.8 |
| Summarization & Meeting Notes |
#35
/ 110
Excellent
|
#4
/ 110
Excellent
|
Gemini 3.1 Pro Preview |
| AI Strategy |
#3
/ 126
Strong
|
#30
/ 126
Strong
|
Claude Opus 4.8 |
| Landing Pages |
#28
/ 72
Strong
|
#2
/ 72
Strong
|
Gemini 3.1 Pro Preview |
| Product & Project Management |
#2
/ 110
Excellent
|
#25
/ 110
Excellent
|
Claude Opus 4.8 |
| Translation & Localization |
#35
/ 110
Excellent
|
#14
/ 110
Excellent
|
Gemini 3.1 Pro Preview |
| Knowledge & Docs |
#2
/ 110
Excellent
|
#19
/ 110
Strong
|
Claude Opus 4.8 |
| Creative & Comedy | #17 / 110 | #4 / 110 | Gemini 3.1 Pro Preview |
| Frontend & Landing Pages |
#19
/ 109
Needs editing
|
#32
/ 109
Needs editing
|
Claude Opus 4.8 |
| Investor & Pitch |
#25
/ 66
Strong
|
#12
/ 66
Strong
|
Gemini 3.1 Pro Preview |
| Legal & HR |
#5
/ 110
Excellent
|
#17
/ 110
Excellent
|
Claude Opus 4.8 |
| Content & Brand |
#15
/ 124
Strong
|
#5
/ 124
Strong
|
Gemini 3.1 Pro Preview |
| Presentations & Decks |
#32
/ 110
Excellent
|
#24
/ 110
Excellent
|
Gemini 3.1 Pro Preview |
| Customer Support |
#7
/ 113
Strong
|
#1
/ 113
Strong
|
Gemini 3.1 Pro Preview |
| Executive Assistant |
#8
/ 112
Strong
|
#13
/ 112
Strong
|
Claude Opus 4.8 |
| RAG, Safety & Grounding |
#11
/ 113
Excellent
|
#8
/ 113
Excellent
|
Gemini 3.1 Pro Preview |
| Chef / Home Cooking |
#8
/ 126
Strong
|
#6
/ 126
Strong
|
Gemini 3.1 Pro Preview |
| Training & Education |
#2
/ 110
Excellent
|
#4
/ 110
Excellent
|
Claude Opus 4.8 |
Rank = position among every model config we tested in that task area (lower is better). Sorted by biggest gap first.
Same task, both models — judged
Both models answered the same test case; an independent judge graded each. Quotes are the judge's actual rationale.
Sales
Freight broker ops director (Cold Outbound Email)
“The response is expert-level and production-ready, perfectly integrating the specific hiring trigger, maintaining the required pragmatic tone, staying under the word limit, and offering a highly effective, low-friction CTA without inventing facts.”
“The email slightly exceeds the word limit and has a feature-heavy middle paragraph that reduces skimmability.”
Frequently asked
Is Claude Opus 4.8 better than Gemini 3.1 Pro Preview?
They tie across the 22 task areas we benchmarked, at 11 apiece.
Which is cheaper, Claude Opus 4.8 or Gemini 3.1 Pro Preview?
Gemini 3.1 Pro Preview costs 2.1× less per token ($14 vs $30 per 1M).
What is Claude Opus 4.8 better at?
Claude Opus 4.8 out-ranks Gemini 3.1 Pro Preview at Coding, Research & Competitive Analysis, Data & Analytics.
What is Gemini 3.1 Pro Preview better at?
Gemini 3.1 Pro Preview out-ranks Claude Opus 4.8 at Structured Output, Summarization & Meeting Notes, Landing Pages.
This page is Spring Prompt, running
We just did this for every model. Do it for your prompt.
The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.
- Generate test cases from your prompt — no eval set required to start.
- Compare models side by side with quality, cost and latency in one matrix.
- Optimise the winner until the scores say it's ready to ship.
Prompt × model results
12 test cases · 3 evals