Kimi K2.7 Code vs Qwen3.7 Max: which wins at real work?

22 task areas · same graded test runs · rank comparison only, so 0–100 and Elo collections never mix raw scores.

Qwen3.7 Max wins 15 of 22 task areas we tested; Kimi K2.7 Code takes 7. Kimi K2.7 Code costs 1.2× less per token ($4.24 vs $5 per 1M).

Kimi K2.7 Code

Qwen3.7 Max

Task areas won

Avg percentile

Top-3 finishes

$4.24

Price / 1M tokens

$5.0

MoonshotAI

Provider

Qwen

Kimi K2.7 Code costs 1.2× less per token ($4.24 vs $5 per 1M).

Task by task

Task area	Kimi K2.7 Code	Qwen3.7 Max	Winner
Content & Brand	#54 / 124 Strong	#1 / 124 Strong	Qwen3.7 Max
Translation & Localization	#46 / 107 Excellent	#1 / 107 Excellent	Qwen3.7 Max
Frontend & Landing Pages	#51 / 106 Needs editing	#11 / 106 Needs editing	Qwen3.7 Max
Structured Output	#42 / 110 Excellent	#2 / 110 Excellent	Qwen3.7 Max
Research & Competitive Analysis	#19 / 107 Excellent	#55 / 107 Usable	Kimi K2.7 Code
Creative & Comedy	#42 / 107	#8 / 107	Qwen3.7 Max
Summarization & Meeting Notes	#44 / 107 Excellent	#10 / 107 Excellent	Qwen3.7 Max
Legal & HR	#14 / 107 Excellent	#44 / 107 Excellent	Kimi K2.7 Code
Executive Assistant	#6 / 109 Strong	#31 / 109 Strong	Kimi K2.7 Code
Training & Education	#38 / 107 Excellent	#13 / 107 Excellent	Qwen3.7 Max
Sales	#46 / 107 Usable	#23 / 107 Strong	Qwen3.7 Max
Coding	#37 / 115 Strong	#16 / 115 Excellent	Qwen3.7 Max
Knowledge & Docs	#16 / 107 Excellent	#33 / 107 Strong	Kimi K2.7 Code
RAG, Safety & Grounding	#18 / 110 Excellent	#1 / 110 Excellent	Qwen3.7 Max
Data & Analytics	#29 / 110 Excellent	#14 / 110 Excellent	Qwen3.7 Max
AI Strategy	#41 / 126 Strong	#33 / 126 Strong	Qwen3.7 Max
Chef / Home Cooking	#54 / 126 Usable	#46 / 126 Usable	Qwen3.7 Max
Investor & Pitch	#11 / 63 Strong	#15 / 63 Strong	Kimi K2.7 Code
Presentations & Decks	#4 / 107 Excellent	#8 / 107 Excellent	Kimi K2.7 Code
Product & Project Management	#10 / 107 Excellent	#7 / 107 Excellent	Qwen3.7 Max
Landing Pages	#3 / 69 Strong	#1 / 69 Strong	Qwen3.7 Max
Customer Support	#8 / 113 Strong	#9 / 113 Strong	Kimi K2.7 Code

Rank = position among every model config we tested in that task area (lower is better). Sorted by biggest gap first.

Frequently asked

Is Kimi K2.7 Code better than Qwen3.7 Max?

Across 22 task areas we benchmarked, Qwen3.7 Max ranks higher in 15 and Kimi K2.7 Code in 7.

Which is cheaper, Kimi K2.7 Code or Qwen3.7 Max?

Kimi K2.7 Code costs 1.2× less per token ($4.24 vs $5 per 1M).

What is Kimi K2.7 Code better at?

Kimi K2.7 Code out-ranks Qwen3.7 Max at Research & Competitive Analysis, Legal & HR, Executive Assistant.

What is Qwen3.7 Max better at?

Qwen3.7 Max out-ranks Kimi K2.7 Code at Content & Brand, Translation & Localization, Frontend & Landing Pages.

Full Kimi K2.7 Code review → Full Qwen3.7 Max review → Full model leaderboard →

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

Generate test cases from your prompt — no eval set required to start.
Compare models side by side with quality, cost and latency in one matrix.
Optimise the winner until the scores say it's ready to ship.

Join the waitlist Browse all benchmarks

Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals

Claude Opus

GPT-5

Gemini

7.1

6.8

7.4

8.3

7.9

8.0

9.2 ★

8.6

8.4

Best combo: v3 × Claude Opus

9.2 quality · $0.004/run · 1.8s