Is qwen3.7-max-low good at Chef / Home Cooking?
qwen3.7-max-low ranks #20 of 50 for Chef / Home Cooking — usable. The top pick for this task is gemini-3.1-pro-preview-high.
qwen3.7-max-low on each Chef / Home Cooking sub-task
| Dinner Rescue Test | 89.0/100 | #1 |
| Substitution Test | 88.3/100 | #5 |
| Practical Recipe Test | 81.3/100 | #22 |
| Meal Timing Test | 42.7/100 | #39 |
Real examples, graded
WinWatery tomato salsa 95/100
“The response is expert-level, highly practical, and demonstrates excellent culinary judgment. It perfectly adheres to the constraints, provides creative and realistic solutions, and correctly identifies why common panic-fixes (like boiling or adding breadcrumbs) would ruin the dish.”
WinOver-salted soup 88/100
“The response is expert-level, perfectly utilizing the constrained ingredients with sound culinary science (acid/fat masking, dilution, starch absorption). It is highly practical and well-formatted, with only a very minor gap in acknowledging the existing cream's interaction with the new acid and dairy.”
WinCurry without coconut milk 92/100
“The response is expert-level and production-ready. It demonstrates excellent culinary knowledge by addressing the technical risk of curdling yogurt, using onions to replace lost sweetness, and accurately describing the resulting flavor profile shift. All constraints are perfectly met.”
Frequently asked
Is qwen3.7-max-low good at Chef / Home Cooking?
qwen3.7-max-low ranks #20 of 50 models we tested for Chef / Home Cooking, scoring usable.
What is qwen3.7-max-low's strongest Chef / Home Cooking skill?
Its best sub-task here is Dinner Rescue Test.
This page is Spring Prompt, running
We just did this for every model. Do it for your prompt.
The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.
- Generate test cases from your prompt — no eval set required to start.
- Compare models side by side with quality, cost and latency in one matrix.
- Optimise the winner until the scores say it's ready to ship.
Prompt × model results
12 test cases · 3 evals