Claude Opus 4.8 VS Google

Gemini 3.1 Pro Preview

Claude Opus 4.8 vs Gemini 3.1 Pro Preview: which wins at real work?

22 task areas · same graded test runs · rank comparison only, so 0–100 and Elo collections never mix raw scores.

Claude Opus 4.8 and Gemini 3.1 Pro Preview split the 22 task areas we tested 11–11. Gemini 3.1 Pro Preview costs 2.1× less per token ($14 vs $30 per 1M).

Claude Opus 4.8

Gemini 3.1 Pro Preview

Task areas won

Avg percentile

Top-3 finishes

$30.0

Price / 1M tokens

$14.0

Anthropic

Provider

Google

Gemini 3.1 Pro Preview costs 2.1× less per token ($14 vs $30 per 1M).

Task by task

Task area	Claude Opus 4.8	Gemini 3.1 Pro Preview	Winner
Coding	#3 / 115 Excellent	#55 / 115 Strong	Claude Opus 4.8
Structured Output	#59 / 113 Excellent	#15 / 113 Excellent	Gemini 3.1 Pro Preview
Research & Competitive Analysis	#5 / 110 Excellent	#42 / 110 Strong	Claude Opus 4.8
Data & Analytics	#2 / 110 Excellent	#37 / 110 Excellent	Claude Opus 4.8
Sales	#1 / 110 Strong	#32 / 110 Strong	Claude Opus 4.8
Summarization & Meeting Notes	#35 / 110 Excellent	#4 / 110 Excellent	Gemini 3.1 Pro Preview
AI Strategy	#3 / 126 Strong	#30 / 126 Strong	Claude Opus 4.8
Landing Pages	#28 / 72 Strong	#2 / 72 Strong	Gemini 3.1 Pro Preview
Product & Project Management	#2 / 110 Excellent	#25 / 110 Excellent	Claude Opus 4.8
Translation & Localization	#35 / 110 Excellent	#14 / 110 Excellent	Gemini 3.1 Pro Preview
Knowledge & Docs	#2 / 110 Excellent	#19 / 110 Strong	Claude Opus 4.8
Creative & Comedy	#17 / 110	#4 / 110	Gemini 3.1 Pro Preview
Frontend & Landing Pages	#19 / 109 Needs editing	#32 / 109 Needs editing	Claude Opus 4.8
Investor & Pitch	#25 / 66 Strong	#12 / 66 Strong	Gemini 3.1 Pro Preview
Legal & HR	#5 / 110 Excellent	#17 / 110 Excellent	Claude Opus 4.8
Content & Brand	#15 / 124 Strong	#5 / 124 Strong	Gemini 3.1 Pro Preview
Presentations & Decks	#32 / 110 Excellent	#24 / 110 Excellent	Gemini 3.1 Pro Preview
Customer Support	#7 / 113 Strong	#1 / 113 Strong	Gemini 3.1 Pro Preview
Executive Assistant	#8 / 112 Strong	#13 / 112 Strong	Claude Opus 4.8
RAG, Safety & Grounding	#11 / 113 Excellent	#8 / 113 Excellent	Gemini 3.1 Pro Preview
Chef / Home Cooking	#8 / 126 Strong	#6 / 126 Strong	Gemini 3.1 Pro Preview
Training & Education	#2 / 110 Excellent	#4 / 110 Excellent	Claude Opus 4.8

Rank = position among every model config we tested in that task area (lower is better). Sorted by biggest gap first.

Same task, both models — judged

Both models answered the same test case; an independent judge graded each. Quotes are the judge's actual rationale.

Sales

Freight broker ops director (Cold Outbound Email)

Claude Opus 4.8 95/100

“The response is expert-level and production-ready, perfectly integrating the specific hiring trigger, maintaining the required pragmatic tone, staying under the word limit, and offering a highly effective, low-friction CTA without inventing facts.”

Gemini 3.1 Pro Preview 47/100

“The email slightly exceeds the word limit and has a feature-heavy middle paragraph that reduces skimmability.”

Frequently asked

Is Claude Opus 4.8 better than Gemini 3.1 Pro Preview?

They tie across the 22 task areas we benchmarked, at 11 apiece.

Which is cheaper, Claude Opus 4.8 or Gemini 3.1 Pro Preview?

Gemini 3.1 Pro Preview costs 2.1× less per token ($14 vs $30 per 1M).

What is Claude Opus 4.8 better at?

Claude Opus 4.8 out-ranks Gemini 3.1 Pro Preview at Coding, Research & Competitive Analysis, Data & Analytics.

What is Gemini 3.1 Pro Preview better at?

Gemini 3.1 Pro Preview out-ranks Claude Opus 4.8 at Structured Output, Summarization & Meeting Notes, Landing Pages.

Full Claude Opus 4.8 review → Full Gemini 3.1 Pro Preview review → Full model leaderboard →

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

Generate test cases from your prompt — no eval set required to start.
Compare models side by side with quality, cost and latency in one matrix.
Optimise the winner until the scores say it's ready to ship.

Join the waitlist Browse all benchmarks

Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals

Claude Opus

GPT-5

Gemini

7.1

6.8

7.4

8.3

7.9

8.0

9.2 ★

8.6

8.4

Best combo: v3 × Claude Opus

9.2 quality · $0.004/run · 1.8s