Personal · 12 tasks · 50 models
Smartest AI models for Chef / Home Cooking
Which models can give practical cooking help that works in a real kitchen?
The highest-quality model for Chef / Home Cooking is gemini-3.1-pro-preview-high (strong).
Top score — strong
Clears the quality bar at $0.016/run
~19s per run, still strong
Quality vs. cost
Every model placed by what it delivers and what it costs. The best value sits high and to the left.
Full ranking
| # | Model | Score | Cost/run | Speed | Best for |
|---|---|---|---|---|---|
| 1 | gemini-3.1-pro-preview-high | 83.8 Strong | $0.0443 | 40.2s | Strong drafts |
| 2 | claude-opus-4.7 | 82.7 Strong | $0.0456 | 31.6s | Strong drafts |
| 3 | gpt-5.5 | 82.3 Strong | $0.0697 | 52.0s | Strong drafts |
| 4 | claude-opus-4.8-high | 81.8 Strong | $0.0450 | 28.6s | Strong drafts |
| 5 | gpt-5.5-low | 81.2 Strong | $0.0473 | 31.8s | Strong drafts |
| 6 | gpt-5.4-high | 80.3 Strong | $0.0785 | 59.9s | Strong drafts |
| 7 | claude-sonnet-4.6-low | 79.9 Usable | $0.0537 | 54.6s | Strong drafts |
| 8 | gpt-5.4 | 79.5 Usable | $0.0308 | 29.2s | Strong drafts |
| 9 | claude-opus-4.6-high | 79.5 Usable | $0.0874 | 71.9s | Strong drafts |
| 10 | claude-opus-4.8 | 79.2 Usable | $0.0438 | 27.7s | Strong drafts |
| 11 | claude-opus-4.8-low | 79.2 Usable | $0.0452 | 28.6s | Strong drafts |
| 12 | gemini-3.1-pro-preview-low | 78.8 Usable | $0.0452 | 31.8s | Strong drafts |
| 13 | gemini-3.5-flash-low | 78.6 Usable | $0.0378 | 26.3s | Strong drafts |
| 14 | claude-opus-4.6 | 78.2 Usable | $0.0546 | 46.3s | Strong drafts |
| 15 | claude-sonnet-4.6-high | 77.5 Usable | $0.0688 | 72.2s | Strong drafts |
| 16 | gpt-5.5-high | 76.8 Usable | $0.0861 | 64.6s | Strong drafts |
| 17 | grok-4.20-beta | 76.5 Usable | $0.0170 | 19.4s | Strong drafts |
| 18 | gemini-3-flash-preview | 76.2 Usable | $0.0251 | 24.3s | Strong drafts |
| 19 | qwen3.5-plus-02-15 | 75.7 Usable | $0.0184 | 66.4s | Strong drafts |
| 20 | qwen3.7-max-low | 75.3 Usable | $0.0370 | 74.4s | Strong drafts |
| 21 | qwen3.7-max | 75.2 Usable | $0.0293 | 72.9s | Strong drafts |
| 22 | claude-opus-4.5-high | 75.1 Usable | $0.0655 | 50.4s | Strong drafts |
| 23 | qwen3.7-max-high | 75.0 Usable | $0.0342 | 72.7s | Strong drafts |
| 24 | glm-5.1 | 74.8 Usable | $0.0186 | 60.4s | Needs review |
| 25 | claude-opus-4.6-low | 74.8 Usable | $0.0849 | 68.5s | Needs review |
| 26 | gpt-5.4-low | 74.2 Usable | $0.0426 | 29.0s | Needs review |
| 27 | gpt-5.5-pro | 73.8 Usable | $0.7482 | 134.1s | Needs review |
| 28 | claude-sonnet-4.6 | 73.5 Usable | $0.0366 | 39.6s | Needs review |
| 29 | claude-opus-4.5-low | 72.8 Usable | $0.0660 | 45.1s | Needs review |
| 30 | deepseek-v3.2 | 72.5 Usable | $0.0160 | 40.8s | Bulk baseline |
| 31 | claude-sonnet-4.5-low | 72.5 Usable | $0.0388 | 36.7s | Needs review |
| 32 | gemini-3.1-flash-lite | 72.4 Usable | $0.0268 | 19.2s | Needs review |
| 33 | claude-sonnet-4.5-high | 72.2 Usable | $0.0411 | 39.6s | Needs review |
| 34 | gpt-5-mini | 72.1 Usable | $0.0250 | 51.4s | Needs review |
| 35 | gemini-3.1-pro-preview | 72.1 Usable | $0.0420 | 31.9s | Needs review |
| 36 | gpt-5.4-mini | 72.0 Usable | $0.0203 | 19.5s | Bulk baseline |
| 37 | kimi-k2.7-code | 71.7 Usable | $0.0332 | 71.1s | Needs review |
| 38 | claude-sonnet-4.5 | 70.8 Usable | $0.0386 | 37.2s | Needs review |
| 39 | deepseek-v3.2-high | 70.7 Usable | $0.0260 | 43.4s | Needs review |
| 40 | deepseek-v3.1-terminus | 69.6 Needs editing | $0.0294 | 53.0s | Needs review |
| 41 | grok-4.20 | 69.5 Needs editing | $0.0294 | 24.2s | Needs review |
| 42 | claude-opus-4.5 | 69.3 Needs editing | $0.0452 | 33.5s | Needs review |
| 43 | kimi-k2.5 | 67.2 Needs editing | $0.0186 | 51.8s | Needs review |
| 44 | gpt-5.4-nano | 63.6 Needs editing | $0.0204 | 21.5s | Bulk baseline |
| 45 | claude-haiku-4.5 | 62.9 Needs editing | $0.0283 | 22.8s | Needs review |
| 46 | glm-5 | 60.9 Needs editing | $0.0225 | 77.9s | Needs review |
| 47 | mistral-medium-3.1 | 60.8 Needs editing | $0.0246 | 26.6s | Needs review |
| 48 | gemini-3.5-flash-high | 60.5 Needs editing | $0.0334 | 23.9s | Needs review |
| 49 | deepseek-v3.2-low | 58.9 Weak | $0.0235 | 36.4s | Needs review |
| 50 | minimax-m2.7 | 49.4 Weak | $0.0161 | 56.9s | Needs review |
What separates the top models
Practical Recipe Test
easyTests whether the model can create a recipe that is realistic, tasty, and follows constraints.
Leader: gemini-3.1-pro-preview-high
Dinner Rescue Test
mediumTests whether the model can rescue a cooking mistake.
Leader: qwen3.7-max-low
Meal Timing Test
hardTests whether the model can coordinate multiple dishes.
Leader: claude-opus-4.7
Substitution Test
mediumTests whether the model can make substitutions without ruining the dish.
Leader: claude-opus-4.6-low
See the difference
Real model outputs and how our judge scored them.
Top score
qwen3.7-max
Watery tomato salsa
95/100
“The response is expert-level, highly practical, and demonstrates excellent culinary judgment. It perfectly adheres to the constraints, provides creative and realistic solutions, and correctly identifies why common panic-fixes (like boiling or adding breadcrumbs) would ruin the dish.”
### The Fastest Fix (If you still want salsa) **Strain it.** You cannot magically evaporate the water without cooking it, which ruins fresh salsa. Straining is the fastest, most practical way to restore the texture. **Step-by-Step:** 1. Place a fine-mesh strainer or colander over a large bowl. 2. Pour the watery salsa into the strainer and let it sit for 3 to 5 minutes. 3. **Do not press or mash** the tomatoes through the mesh, or you will end up with tomato mush. Just let gravity do the work. 4. Transfer the drained solids back to a clean bowl. 5. Taste it. The draining process removes some salt and lime, so add a pinch more of each if needed. 6. *Save the drained tomato-habanero liquid in the bowl—it is packed with flavor and will be used in the repurpose below!* *** ### The Best Repurpose Idea (If you want to pivot) **Salsa-Poached Eggs (Huevos a la Mexicana) with Crispy Tortillas.** Since your salsa is essentially a highly flavorful, spicy tomato broth, use that exact texture to your advantage by poaching eggs in it. The Greek yogurt will act as a cooling crema to balance the habanero. **Step-by-Step:** 1. **Prep the tortillas:** Cut your tortillas into wedges or strips. Heat a generous glug of olive oil in a skillet over medium-high heat. Fry the tortilla pieces until golden and crispy (about 2-3 mins). Remove and set aside on a paper towel. 2. **Simmer the broth:** Pour the reserved, strained watery tomato liquid (or the whole unstrained salsa if you skipped the fix) into the skillet. Bring it to a gentle simmer over medium heat. 3. **Poach the eggs:** Crack your eggs directly into the simmering salsa. Cover the skillet with a lid and let them cook for 3 to 4 minutes, or until the egg whites are set but the yolks are still runny. 4. **Assemble:** Place
Mid-pack
glm-5
Quick chicken thighs with limited ingredients
81/100
“The recipe is highly practical, flavorful, and strictly follows all constraints with excellent timing notes. However, the optional upgrade title is disconnected from its description, slightly impacting instruction clarity.”
Here is a quick, flavorful recipe for **Spiced Paprika Chicken with Tomato Rice**.
**Serves:** 2
**Total Time:** 25–30 minutes
### Ingredients
* **Chicken Thighs:** 2 large (approx. 300g), boneless and skinless, cut into bite-sized chunks
* **Rice:** 150g (approx. 3/4 cup) long-grain or basmati rice
* **Onion:** 1 medium, diced
* **Tomatoes:** 2 large (fresh), roughly chopped (or 1 x 400g can chopped tomatoes)
* **Garlic:** 2 cloves, minced
* **Paprika:** 1 tbsp (smoked paprika works best)
* **Yogurt:** 3 tbsp (Greek or natural)
* **Pantry Staples:** 1 tbsp cooking oil, salt, pepper, and 300ml water (or stock) for the rice.
### Method
1. **Start the Rice (00:00 - 02:00):**
Rinse the rice until the water runs clear. Place in the **pot** with 300ml water (or stock) and a pinch of salt. Bring to a boil, then reduce heat to low, cover with a lid, and simmer for 12–15 minutes until water is absorbed. Remove from heat and leave lid on.
2. **Sear the Chicken (02:00 - 07:00):**
While the rice heats up, heat the oil in the **pan** over medium-high heat. Season the chicken chunks with salt, pepper, and half the paprika. Add to the pan and sear for 4–5 minutes until browned on all sides. Remove chicken from the pan and set aside.
3. **Build the Base (07:00 - 12:00):**
In the same pan (don't wash it), add the diced onion. Sauté for 3 minutes until softening. Add the garlic and remaining paprika; cook for 1 minute until fragrant. Add the chopped tomatoes. If using fresh tomatoes, add a splash of water to help them break down.
4. **Simmer & Combine (12:00 - 20:00):**
Return the seared chicken (and any juices) to the pan. Stir well. Reduce heat to medium-low and let the sauce simmer for 8–10 minutes, or until the chicken is cooked through and
Lowest score
kimi-k2.7-code
Steak, wings, potatoes, broccoli dinner
0/100
“The model output is entirely a chain-of-thought scratchpad that cuts off abruptly before delivering any of the requested deliverables. It completely fails the task.”
We need answer cooking timeline for these dishes. Need think carefully. User asks: Create a cooking timeline. Dishes: 1.6kg bone-in ribeye, air fryer chicken wings, potato and caramelised onion tart, broccoli, tomato and feta salad. Constraints: steak should rest before serving; potatoes already sliced; onions already caramelised; salad already prepped; need everything ready together. Equipment: oven, air fryer, hob, meat thermometer. Serve time: 90 minutes from now. Return timeline by minute, target steak temperatures, when to rest steak, when to cook wings, when to finish broccoli, final 10-minute plating plan. We need produce a detailed minute-by-minute timeline in reverse from T-90 to serve. Need estimate cooking times and temps. Need consider bone-in ribeye 1.6kg, likely thick. Cooking method maybe reverse sear: oven then sear on hob. Rest. Air fryer wings maybe 20-25 min at 200C. Tart with sliced potatoes and caramelised onions, likely pre-bake pastry blind? It says potatoes already sliced; onions already caramelised. Need assemble and bake, maybe 40-50 min. Broccoli finish maybe steam/sauté 5-7 min. Salad prepped. Need everything ready together at T=0. Need decide timeline. Serve time 90 minutes from now. We work backwards. We need include target steak temperatures (internal temp before rest and after? Probably pull temp and final target). For medium-rare: pull from oven at 48-50°C, sear to final 54-55°C, rest. For medium: pull 55-57°C, final 60-63°C. We can give options. Need roasting times for 1.6kg bone-in ribeye. Bone-in ribeye (tomahawk/cote de boeuf) thickness maybe 2.5-3 inches. Reverse sear at 110-120°C oven: 45-60 minutes to reach 48°C internal. Then sear in pan 2-3 min each side to 54°C. Rest 10-15 min. Since total 90 min, we can start steak in oven
Where models still fail
The most common problems we flagged across all models.
Frequently asked
What is the best AI model for chef / home cooking?
In our benchmarks, gemini-3.1-pro-preview-high ranks first for chef / home cooking, scoring strong, across 12 test cases.
What is the cheapest good model for chef / home cooking?
deepseek-v3.2 is the best value: it clears our quality bar for chef / home cooking at $0.016 per run.
Which model is fastest for chef / home cooking?
gemini-3.1-flash-lite is the fastest model that still performs well for chef / home cooking.
How we test
Each model output is scored by a strict JSON LLM judge, supported by deterministic heuristics, then normalized to a 0-100 score.
Judge: gemini-3.1-pro-preview · 648 model runs across 4 benchmarks · last tested 2026-06-30
This page is Spring Prompt, running
We just did this for every model. Do it for your prompt.
The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.
- Generate test cases from your prompt — no eval set required to start.
- Compare models side by side with quality, cost and latency in one matrix.
- Optimise the winner until the scores say it's ready to ship.
Prompt × model results
12 test cases · 3 evals