Google · 7 models tested

Which Google model should you use?

Q: What is the best Google model?

Gemini 3.1 Pro Preview — the highest average percentile (84) across the task areas we benchmarked, with 3 top-3 finishes.

Q: What is the best cheap Google model?

Gemini 3.5 Flash: within 9 percentile points of Gemini 3.1 Pro Preview at $10.5 per 1M tokens.

Q: Where does Google lead every other lab?

A Google model holds the outright #1 spot in 2 task areas: Customer Support, Frontend & Landing Pages.

Lineup Gemini 3.1 Pro Preview Gemini 3.5 Flash Gemini 3.1 Flash Lite Preview Gemini 3 Flash Preview

Google's strongest model is Gemini 3.1 Pro Preview (avg percentile 84, top-3 in 3 of 22 task areas). On a budget, Gemini 3.5 Flash stays close at $10.5/1M.

The Google lineup, ranked

Model	Overall	Code & data	Writing & comms	Business & strategy	Creative & visual	AA Intel†	Top-3s	Price / 1M
Gemini 3.1 Pro Preview Best overall 22 task areas	84	76	90★	82★	91★	46	3	$14.0
Gemini 3.5 Flash Best value 22 task areas	75	78★	80	65	87	50	0	$10.5
Gemini 3.1 Flash Lite Preview 4 task areas	67	61	86	—	—	25	0	$1.75
Gemini 3 Flash Preview 22 task areas	64	75	64	51	80	—	2	$3.5
Gemini 3.1 Flash Lite 22 task areas	58	61	70	45	64	—	0	$1.75
Gemini 2.5 Pro 1 task areas	37	—	—	—	37	26	0	$11.25
Gemini 2.5 Flash 1 task areas	10	—	—	—	10	—	0	$0.75

Area figures are mean percentile ranks; 85+ 70–84 55–69 40–54 <40 · ★ = the lineup's best in that area. †AA Intel = Artificial Analysis Intelligence Index, third-party reference only.

Best Google model, task by task

The lineup's strongest model in each task area — and where it lands against every model we test, not just Google's. #1 means it beats every rival lab too.

Task area	Their best here	Rank vs. everyone	Rating
Customer Support	Gemini 3.1 Pro Preview	#1 / 113	Strong
Frontend & Landing Pages	Gemini 3 Flash Preview	#1 / 106	Usable
Landing Pages	Gemini 3.1 Pro Preview	#2 / 69	Strong
Content & Brand	Gemini 3 Flash Preview	#3 / 124	Strong
Summarization & Meeting Notes	Gemini 3.1 Pro Preview	#3 / 107	Excellent
Creative & Comedy	Gemini 3.1 Pro Preview	#4 / 107	Elo
Training & Education	Gemini 3.1 Pro Preview	#4 / 107	Excellent
RAG, Safety & Grounding	Gemini 3.1 Flash Lite Preview	#5 / 110	Excellent
Chef / Home Cooking	Gemini 3.1 Pro Preview	#6 / 126	Strong
Investor & Pitch	Gemini 3.1 Pro Preview	#10 / 63	Strong
Executive Assistant	Gemini 3.1 Pro Preview	#12 / 109	Strong
Structured Output	Gemini 3.1 Pro Preview	#14 / 110	Excellent
Translation & Localization	Gemini 3.1 Pro Preview	#14 / 107	Excellent
Presentations & Decks	Gemini 3.5 Flash	#15 / 107	Excellent
Legal & HR	Gemini 3.1 Pro Preview	#17 / 107	Excellent
Coding	Gemini 3 Flash Preview	#19 / 115	Strong
Knowledge & Docs	Gemini 3.1 Pro Preview	#19 / 107	Strong
AI Strategy	Gemini 3.5 Flash	#20 / 126	Strong
Product & Project Management	Gemini 3.1 Pro Preview	#25 / 107	Excellent
Data & Analytics	Gemini 3.5 Flash	#31 / 107	Excellent
Sales	Gemini 3.1 Pro Preview	#31 / 107	Strong
Research & Competitive Analysis	Gemini 3.1 Pro Preview	#40 / 107	Strong

Where another lab clearly wins

Honesty corner: the task areas where even Google's best model ranks furthest from the top. See who actually leads there.

Research & Competitive Analysis their best: #40 Data & Analytics their best: #31 Sales their best: #31

Frequently asked

What is the best Google model?

Gemini 3.1 Pro Preview — the highest average percentile (84) across the task areas we benchmarked, with 3 top-3 finishes.

What is the best cheap Google model?

Gemini 3.5 Flash: within 9 percentile points of Gemini 3.1 Pro Preview at $10.5 per 1M tokens.

Where does Google lead every other lab?

A Google model holds the outright #1 spot in 2 task areas: Customer Support, Frontend & Landing Pages.

Every model, every lab — full leaderboard → Best model by task →

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

Generate test cases from your prompt — no eval set required to start.
Compare models side by side with quality, cost and latency in one matrix.
Optimise the winner until the scores say it's ready to ship.

Join the waitlist Browse all benchmarks

Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals

Claude Opus

GPT-5

Gemini

7.1

6.8

7.4

8.3

7.9

8.0

9.2 ★

8.6

8.4

Best combo: v3 × Claude Opus

9.2 quality · $0.004/run · 1.8s