Personal · 12 tasks · 50 models

Smartest AI models for Chef / Home Cooking

Name: Chef / Home Cooking AI model benchmark
Creator: Spring Prompt

Which models can give practical cooking help that works in a real kitchen?

Top models Google

gemini-3.1-pro-preview-high Anthropic

claude-opus-4.7 OpenAI

gpt-5.5

The highest-quality model for Chef / Home Cooking is gemini-3.1-pro-preview-high (strong).

Best overall ★ Strong

gemini-3.1-pro-preview-high

Top score — strong

83.8 score $0.0443/run 40.2s

Best value Usable

deepseek-v3.2

Clears the quality bar at $0.016/run

72.5 score $0.0160/run 40.8s

Fastest usable Usable

gemini-3.1-flash-lite

~19s per run, still strong

72.4 score $0.0268/run 19.2s

Quality vs. cost

Every model placed by what it delivers and what it costs. The best value sits high and to the left.

Full ranking

Best overall Cheapest Fastest Smartest

#	Model	Score	Cost/run	Speed	Best for
1	gemini-3.1-pro-preview-high	83.8 Strong	$0.0443	40.2s	Strong drafts
2	claude-opus-4.7	82.7 Strong	$0.0456	31.6s	Strong drafts
3	gpt-5.5	82.3 Strong	$0.0697	52.0s	Strong drafts
4	claude-opus-4.8-high	81.8 Strong	$0.0450	28.6s	Strong drafts
5	gpt-5.5-low	81.2 Strong	$0.0473	31.8s	Strong drafts
6	gpt-5.4-high	80.3 Strong	$0.0785	59.9s	Strong drafts
7	claude-sonnet-4.6-low	79.9 Usable	$0.0537	54.6s	Strong drafts
8	gpt-5.4	79.5 Usable	$0.0308	29.2s	Strong drafts
9	claude-opus-4.6-high	79.5 Usable	$0.0874	71.9s	Strong drafts
10	claude-opus-4.8	79.2 Usable	$0.0438	27.7s	Strong drafts
11	claude-opus-4.8-low	79.2 Usable	$0.0452	28.6s	Strong drafts
12	gemini-3.1-pro-preview-low	78.8 Usable	$0.0452	31.8s	Strong drafts
13	gemini-3.5-flash-low	78.6 Usable	$0.0378	26.3s	Strong drafts
14	claude-opus-4.6	78.2 Usable	$0.0546	46.3s	Strong drafts
15	claude-sonnet-4.6-high	77.5 Usable	$0.0688	72.2s	Strong drafts
16	gpt-5.5-high	76.8 Usable	$0.0861	64.6s	Strong drafts
17	grok-4.20-beta	76.5 Usable	$0.0170	19.4s	Strong drafts
18	gemini-3-flash-preview	76.2 Usable	$0.0251	24.3s	Strong drafts
19	qwen3.5-plus-02-15	75.7 Usable	$0.0184	66.4s	Strong drafts
20	qwen3.7-max-low	75.3 Usable	$0.0370	74.4s	Strong drafts
21	qwen3.7-max	75.2 Usable	$0.0293	72.9s	Strong drafts
22	claude-opus-4.5-high	75.1 Usable	$0.0655	50.4s	Strong drafts
23	qwen3.7-max-high	75.0 Usable	$0.0342	72.7s	Strong drafts
24	glm-5.1	74.8 Usable	$0.0186	60.4s	Needs review
25	claude-opus-4.6-low	74.8 Usable	$0.0849	68.5s	Needs review
26	gpt-5.4-low	74.2 Usable	$0.0426	29.0s	Needs review
27	gpt-5.5-pro	73.8 Usable	$0.7482	134.1s	Needs review
28	claude-sonnet-4.6	73.5 Usable	$0.0366	39.6s	Needs review
29	claude-opus-4.5-low	72.8 Usable	$0.0660	45.1s	Needs review
30	deepseek-v3.2	72.5 Usable	$0.0160	40.8s	Bulk baseline
31	claude-sonnet-4.5-low	72.5 Usable	$0.0388	36.7s	Needs review
32	gemini-3.1-flash-lite	72.4 Usable	$0.0268	19.2s	Needs review
33	claude-sonnet-4.5-high	72.2 Usable	$0.0411	39.6s	Needs review
34	gpt-5-mini	72.1 Usable	$0.0250	51.4s	Needs review
35	gemini-3.1-pro-preview	72.1 Usable	$0.0420	31.9s	Needs review
36	gpt-5.4-mini	72.0 Usable	$0.0203	19.5s	Bulk baseline
37	kimi-k2.7-code	71.7 Usable	$0.0332	71.1s	Needs review
38	claude-sonnet-4.5	70.8 Usable	$0.0386	37.2s	Needs review
39	deepseek-v3.2-high	70.7 Usable	$0.0260	43.4s	Needs review
40	deepseek-v3.1-terminus	69.6 Needs editing	$0.0294	53.0s	Needs review
41	grok-4.20	69.5 Needs editing	$0.0294	24.2s	Needs review
42	claude-opus-4.5	69.3 Needs editing	$0.0452	33.5s	Needs review
43	kimi-k2.5	67.2 Needs editing	$0.0186	51.8s	Needs review
44	gpt-5.4-nano	63.6 Needs editing	$0.0204	21.5s	Bulk baseline
45	claude-haiku-4.5	62.9 Needs editing	$0.0283	22.8s	Needs review
46	glm-5	60.9 Needs editing	$0.0225	77.9s	Needs review
47	mistral-medium-3.1	60.8 Needs editing	$0.0246	26.6s	Needs review
48	gemini-3.5-flash-high	60.5 Needs editing	$0.0334	23.9s	Needs review
49	deepseek-v3.2-low	58.9 Weak	$0.0235	36.4s	Needs review
50	minimax-m2.7	49.4 Weak	$0.0161	56.9s	Needs review

What separates the top models

Practical Recipe Test

easy

Tests whether the model can create a recipe that is realistic, tasty, and follows constraints.

Leader: gemini-3.1-pro-preview-high

Dinner Rescue Test

medium

Tests whether the model can rescue a cooking mistake.

Leader: qwen3.7-max-low

Meal Timing Test

hard

Tests whether the model can coordinate multiple dishes.

Leader: claude-opus-4.7

Substitution Test

medium

Tests whether the model can make substitutions without ruining the dish.

Leader: claude-opus-4.6-low

See the difference

Real model outputs and how our judge scored them.

Top score qwen3.7-max Watery tomato salsa

95/100

“The response is expert-level, highly practical, and demonstrates excellent culinary judgment. It perfectly adheres to the constraints, provides creative and realistic solutions, and correctly identifies why common panic-fixes (like boiling or adding breadcrumbs) would ruin the dish.”

### The Fastest Fix (If you still want salsa)
**Strain it.** You cannot magically evaporate the water without cooking it, which ruins fresh salsa. Straining is the fastest, most practical way to restore the texture. 

**Step-by-Step:**
1. Place a fine-mesh strainer or colander over a large bowl. 
2. Pour the watery salsa into the strainer and let it sit for 3 to 5 minutes. 
3. **Do not press or mash** the tomatoes through the mesh, or you will end up with tomato mush. Just let gravity do the work.
4. Transfer the drained solids back to a clean bowl. 
5. Taste it. The draining process removes some salt and lime, so add a pinch more of each if needed. 
6. *Save the drained tomato-habanero liquid in the bowl—it is packed with flavor and will be used in the repurpose below!*

***

### The Best Repurpose Idea (If you want to pivot)
**Salsa-Poached Eggs (Huevos a la Mexicana) with Crispy Tortillas.** 
Since your salsa is essentially a highly flavorful, spicy tomato broth, use that exact texture to your advantage by poaching eggs in it. The Greek yogurt will act as a cooling crema to balance the habanero.

**Step-by-Step:**
1. **Prep the tortillas:** Cut your tortillas into wedges or strips. Heat a generous glug of olive oil in a skillet over medium-high heat. Fry the tortilla pieces until golden and crispy (about 2-3 mins). Remove and set aside on a paper towel. 
2. **Simmer the broth:** Pour the reserved, strained watery tomato liquid (or the whole unstrained salsa if you skipped the fix) into the skillet. Bring it to a gentle simmer over medium heat. 
3. **Poach the eggs:** Crack your eggs directly into the simmering salsa. Cover the skillet with a lid and let them cook for 3 to 4 minutes, or until the egg whites are set but the yolks are still runny.
4. **Assemble:** Place

Mid-pack glm-5 Quick chicken thighs with limited ingredients

81/100

“The recipe is highly practical, flavorful, and strictly follows all constraints with excellent timing notes. However, the optional upgrade title is disconnected from its description, slightly impacting instruction clarity.”

Here is a quick, flavorful recipe for **Spiced Paprika Chicken with Tomato Rice**.

**Serves:** 2
**Total Time:** 25–30 minutes

### Ingredients
*   **Chicken Thighs:** 2 large (approx. 300g), boneless and skinless, cut into bite-sized chunks
*   **Rice:** 150g (approx. 3/4 cup) long-grain or basmati rice
*   **Onion:** 1 medium, diced
*   **Tomatoes:** 2 large (fresh), roughly chopped (or 1 x 400g can chopped tomatoes)
*   **Garlic:** 2 cloves, minced
*   **Paprika:** 1 tbsp (smoked paprika works best)
*   **Yogurt:** 3 tbsp (Greek or natural)
*   **Pantry Staples:** 1 tbsp cooking oil, salt, pepper, and 300ml water (or stock) for the rice.

### Method

1.  **Start the Rice (00:00 - 02:00):**
    Rinse the rice until the water runs clear. Place in the **pot** with 300ml water (or stock) and a pinch of salt. Bring to a boil, then reduce heat to low, cover with a lid, and simmer for 12–15 minutes until water is absorbed. Remove from heat and leave lid on.

2.  **Sear the Chicken (02:00 - 07:00):**
    While the rice heats up, heat the oil in the **pan** over medium-high heat. Season the chicken chunks with salt, pepper, and half the paprika. Add to the pan and sear for 4–5 minutes until browned on all sides. Remove chicken from the pan and set aside.

3.  **Build the Base (07:00 - 12:00):**
    In the same pan (don't wash it), add the diced onion. Sauté for 3 minutes until softening. Add the garlic and remaining paprika; cook for 1 minute until fragrant. Add the chopped tomatoes. If using fresh tomatoes, add a splash of water to help them break down.

4.  **Simmer & Combine (12:00 - 20:00):**
    Return the seared chicken (and any juices) to the pan. Stir well. Reduce heat to medium-low and let the sauce simmer for 8–10 minutes, or until the chicken is cooked through and

Lowest score kimi-k2.7-code Steak, wings, potatoes, broccoli dinner

0/100

“The model output is entirely a chain-of-thought scratchpad that cuts off abruptly before delivering any of the requested deliverables. It completely fails the task.”

We need answer cooking timeline for these dishes. Need think carefully. User asks: Create a cooking timeline. Dishes: 1.6kg bone-in ribeye, air fryer chicken wings, potato and caramelised onion tart, broccoli, tomato and feta salad. Constraints: steak should rest before serving; potatoes already sliced; onions already caramelised; salad already prepped; need everything ready together. Equipment: oven, air fryer, hob, meat thermometer. Serve time: 90 minutes from now. Return timeline by minute, target steak temperatures, when to rest steak, when to cook wings, when to finish broccoli, final 10-minute plating plan.

We need produce a detailed minute-by-minute timeline in reverse from T-90 to serve. Need estimate cooking times and temps. Need consider bone-in ribeye 1.6kg, likely thick. Cooking method maybe reverse sear: oven then sear on hob. Rest. Air fryer wings maybe 20-25 min at 200C. Tart with sliced potatoes and caramelised onions, likely pre-bake pastry blind? It says potatoes already sliced; onions already caramelised. Need assemble and bake, maybe 40-50 min. Broccoli finish maybe steam/sauté 5-7 min. Salad prepped. Need everything ready together at T=0.

Need decide timeline. Serve time 90 minutes from now. We work backwards. We need include target steak temperatures (internal temp before rest and after? Probably pull temp and final target). For medium-rare: pull from oven at 48-50°C, sear to final 54-55°C, rest. For medium: pull 55-57°C, final 60-63°C. We can give options. Need roasting times for 1.6kg bone-in ribeye. Bone-in ribeye (tomahawk/cote de boeuf) thickness maybe 2.5-3 inches. Reverse sear at 110-120°C oven: 45-60 minutes to reach 48°C internal. Then sear in pan 2-3 min each side to 54°C. Rest 10-15 min. Since total 90 min, we can start steak in oven

Where models still fail

The most common problems we flagged across all models.

40incomplete output 19missing required element 13major task miss 9unsafe advice 3major task miss 3unsupported invention 2text glitch 2contradictory output

Frequently asked

What is the best AI model for chef / home cooking?

In our benchmarks, gemini-3.1-pro-preview-high ranks first for chef / home cooking, scoring strong, across 12 test cases.

What is the cheapest good model for chef / home cooking?

deepseek-v3.2 is the best value: it clears our quality bar for chef / home cooking at $0.016 per run.

Which model is fastest for chef / home cooking?

gemini-3.1-flash-lite is the fastest model that still performs well for chef / home cooking.

How we test

Each model output is scored by a strict JSON LLM judge, supported by deterministic heuristics, then normalized to a 0-100 score.

Judge: gemini-3.1-pro-preview · 648 model runs across 4 benchmarks · last tested 2026-06-30

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

Generate test cases from your prompt — no eval set required to start.
Compare models side by side with quality, cost and latency in one matrix.
Optimise the winner until the scores say it's ready to ship.

Join the waitlist Browse all benchmarks

Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals

Claude Opus

GPT-5

Gemini

7.1

6.8

7.4

8.3

7.9

8.0

9.2 ★

8.6

8.4

Best combo: v3 × Claude Opus

9.2 quality · $0.004/run · 1.8s