Kimi K2.7 Code vs Qwen3.7 Max: which wins at real work?
22 task areas · same graded test runs · rank comparison only, so 0–100 and Elo collections never mix raw scores.
Qwen3.7 Max wins 15 of 22 task areas we tested; Kimi K2.7 Code takes 7. Kimi K2.7 Code costs 1.2× less per token ($4.24 vs $5 per 1M).
Kimi K2.7 Code costs 1.2× less per token ($4.24 vs $5 per 1M).
Task by task
| Task area | Kimi K2.7 Code | Qwen3.7 Max | Winner |
|---|---|---|---|
| Content & Brand |
#54
/ 124
Strong
|
#1
/ 124
Strong
|
Qwen3.7 Max |
| Translation & Localization |
#46
/ 107
Excellent
|
#1
/ 107
Excellent
|
Qwen3.7 Max |
| Frontend & Landing Pages |
#51
/ 106
Needs editing
|
#11
/ 106
Needs editing
|
Qwen3.7 Max |
| Structured Output |
#42
/ 110
Excellent
|
#2
/ 110
Excellent
|
Qwen3.7 Max |
| Research & Competitive Analysis |
#19
/ 107
Excellent
|
#55
/ 107
Usable
|
Kimi K2.7 Code |
| Creative & Comedy | #42 / 107 | #8 / 107 | Qwen3.7 Max |
| Summarization & Meeting Notes |
#44
/ 107
Excellent
|
#10
/ 107
Excellent
|
Qwen3.7 Max |
| Legal & HR |
#14
/ 107
Excellent
|
#44
/ 107
Excellent
|
Kimi K2.7 Code |
| Executive Assistant |
#6
/ 109
Strong
|
#31
/ 109
Strong
|
Kimi K2.7 Code |
| Training & Education |
#38
/ 107
Excellent
|
#13
/ 107
Excellent
|
Qwen3.7 Max |
| Sales |
#46
/ 107
Usable
|
#23
/ 107
Strong
|
Qwen3.7 Max |
| Coding |
#37
/ 115
Strong
|
#16
/ 115
Excellent
|
Qwen3.7 Max |
| Knowledge & Docs |
#16
/ 107
Excellent
|
#33
/ 107
Strong
|
Kimi K2.7 Code |
| RAG, Safety & Grounding |
#18
/ 110
Excellent
|
#1
/ 110
Excellent
|
Qwen3.7 Max |
| Data & Analytics |
#29
/ 110
Excellent
|
#14
/ 110
Excellent
|
Qwen3.7 Max |
| AI Strategy |
#41
/ 126
Strong
|
#33
/ 126
Strong
|
Qwen3.7 Max |
| Chef / Home Cooking |
#54
/ 126
Usable
|
#46
/ 126
Usable
|
Qwen3.7 Max |
| Investor & Pitch |
#11
/ 63
Strong
|
#15
/ 63
Strong
|
Kimi K2.7 Code |
| Presentations & Decks |
#4
/ 107
Excellent
|
#8
/ 107
Excellent
|
Kimi K2.7 Code |
| Product & Project Management |
#10
/ 107
Excellent
|
#7
/ 107
Excellent
|
Qwen3.7 Max |
| Landing Pages |
#3
/ 69
Strong
|
#1
/ 69
Strong
|
Qwen3.7 Max |
| Customer Support |
#8
/ 113
Strong
|
#9
/ 113
Strong
|
Kimi K2.7 Code |
Rank = position among every model config we tested in that task area (lower is better). Sorted by biggest gap first.
Frequently asked
Is Kimi K2.7 Code better than Qwen3.7 Max?
Across 22 task areas we benchmarked, Qwen3.7 Max ranks higher in 15 and Kimi K2.7 Code in 7.
Which is cheaper, Kimi K2.7 Code or Qwen3.7 Max?
Kimi K2.7 Code costs 1.2× less per token ($4.24 vs $5 per 1M).
What is Kimi K2.7 Code better at?
Kimi K2.7 Code out-ranks Qwen3.7 Max at Research & Competitive Analysis, Legal & HR, Executive Assistant.
What is Qwen3.7 Max better at?
Qwen3.7 Max out-ranks Kimi K2.7 Code at Content & Brand, Translation & Localization, Frontend & Landing Pages.
This page is Spring Prompt, running
We just did this for every model. Do it for your prompt.
The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.
- Generate test cases from your prompt — no eval set required to start.
- Compare models side by side with quality, cost and latency in one matrix.
- Optimise the winner until the scores say it's ready to ship.
Prompt × model results
12 test cases · 3 evals