Confirm Action

Are you sure you want to proceed?

Is gemini-3.1-pro-preview-high good at Chef / Home Cooking?

gemini-3.1-pro-preview-high ranks #1 of 50 for Chef / Home Cooking — strong.

#1 / 50
Rank for this task
83.8
Score
$0.0443
Cost / run

gemini-3.1-pro-preview-high on each Chef / Home Cooking sub-task

Practical Recipe Test 87.3/100 #1
Dinner Rescue Test 87.0/100 #3
Substitution Test 86.7/100 #14
Meal Timing Test 74.0/100 #4

Real examples, graded

WinSerrano ham scrambled eggs 91/100

“The model perfectly executes the task, employing sound culinary techniques to meet the constraints (crisping ham and adding yogurt off-heat for creamy eggs) while providing a highly appealing, well-sequenced, and practical recipe.”

WinLow-carb/high-carb shared dinner 89/100

“The response is exceptionally strong, providing a realistic, tasty, and well-timed recipe that perfectly meets all constraints. The shared cooking base is handled elegantly, and the timing is highly practical. The only minor culinary nitpick is the risk of burning the garlic/zest marinade over medium-high heat.”

WinDry chicken breast 89/100

“The model provides highly practical, creative, and well-structured solutions using only the provided ingredients. The culinary techniques (shredding, adding fat/acid, using a runny yolk) are expert-level fixes for dry meat. Instructions are exceptionally clear and the 'what not to do' section is highly accurate.”

← Full gemini-3.1-pro-preview-high review All Chef / Home Cooking rankings →

Frequently asked

Is gemini-3.1-pro-preview-high good at Chef / Home Cooking?

gemini-3.1-pro-preview-high ranks #1 of 50 models we tested for Chef / Home Cooking, scoring strong.

What is gemini-3.1-pro-preview-high's strongest Chef / Home Cooking skill?

Its best sub-task here is Practical Recipe Test.

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

  • Generate test cases from your prompt — no eval set required to start.
  • Compare models side by side with quality, cost and latency in one matrix.
  • Optimise the winner until the scores say it's ready to ship.
Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals
Claude Opus
GPT-5
Gemini
v1
7.1
6.8
7.4
v2
8.3
7.9
8.0
v3
9.2
8.6
8.4
Best combo: v3 × Claude Opus
9.2 quality · $0.004/run · 1.8s