Confirm Action

Are you sure you want to proceed?

Is qwen3.7-max-low good at Chef / Home Cooking?

qwen3.7-max-low ranks #20 of 50 for Chef / Home Cooking — usable. The top pick for this task is gemini-3.1-pro-preview-high.

#20 / 50
Rank for this task
75.3
Score
$0.0370
Cost / run

qwen3.7-max-low on each Chef / Home Cooking sub-task

Dinner Rescue Test 89.0/100 #1
Substitution Test 88.3/100 #5
Practical Recipe Test 81.3/100 #22
Meal Timing Test 42.7/100 #39

Real examples, graded

WinWatery tomato salsa 95/100

“The response is expert-level, highly practical, and demonstrates excellent culinary judgment. It perfectly adheres to the constraints, provides creative and realistic solutions, and correctly identifies why common panic-fixes (like boiling or adding breadcrumbs) would ruin the dish.”

WinOver-salted soup 88/100

“The response is expert-level, perfectly utilizing the constrained ingredients with sound culinary science (acid/fat masking, dilution, starch absorption). It is highly practical and well-formatted, with only a very minor gap in acknowledging the existing cream's interaction with the new acid and dairy.”

WinCurry without coconut milk 92/100

“The response is expert-level and production-ready. It demonstrates excellent culinary knowledge by addressing the technical risk of curdling yogurt, using onions to replace lost sweetness, and accurately describing the resulting flavor profile shift. All constraints are perfectly met.”

← Full qwen3.7-max-low review All Chef / Home Cooking rankings → Top pick: gemini-3.1-pro-preview-high →

Frequently asked

Is qwen3.7-max-low good at Chef / Home Cooking?

qwen3.7-max-low ranks #20 of 50 models we tested for Chef / Home Cooking, scoring usable.

What is qwen3.7-max-low's strongest Chef / Home Cooking skill?

Its best sub-task here is Dinner Rescue Test.

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

  • Generate test cases from your prompt — no eval set required to start.
  • Compare models side by side with quality, cost and latency in one matrix.
  • Optimise the winner until the scores say it's ready to ship.
Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals
Claude Opus
GPT-5
Gemini
v1
7.1
6.8
7.4
v2
8.3
7.9
8.0
v3
9.2
8.6
8.4
Best combo: v3 × Claude Opus
9.2 quality · $0.004/run · 1.8s