Is qwen3.7-max-high good at Customer Support?
qwen3.7-max-high ranks #3 of 43 for Customer Support — strong. The top pick for this task is gemini-3.1-pro-preview-low.
qwen3.7-max-high on each Customer Support sub-task
| Resolution | 98.0/100 | #19 |
| Policy Boundaries | 94.7/100 | #14 |
| Help Content Test | 90.8/100 | #2 |
| Escalation and Incident Test | 90.4/100 | #4 |
| Policy Boundary Test | 86.5/100 | #11 |
| Basic Support Reply Test | 83.2/100 | #3 |
| De-escalation | 77.0/100 | #12 |
| Escalation & Handoff | 71.5/100 | #18 |
Real examples, graded
WinPassword reset help 88/100
“The response perfectly executes the task, providing clear, concise, and actionable password reset steps alongside a robust and specific security caution. It is highly professional and production-ready.”
WinInternal support macro review 90/100
“The model perfectly followed the strict constraint to include the exact policy string, improved the tone significantly, and provided a safe, policy-compliant alternative (troubleshooting). The alternative is slightly generic, but appropriate for a macro template.”
WeakFrustrated no-show dispute (Tradewinds) 44/100
“The model provides a highly empathetic response that validates the customer's frustration and correctly applies the credited fee. However, it makes a major unauthorized promise by guaranteeing that the venue will be matched with 'most reliable, top-rated workers well in advance' for future shifts. The policy only allows prioritizing the next request, not guaranteeing specific worker quality or successful matches.”
WeakOver-authority refund → clean handoff (Cedar & Sage) 34/100
“The model makes an unauthorized promise by guaranteeing the cash reversal will be processed, despite it requiring Billing approval. It also invents a 1-2 business day SLA for the Billing department. Additionally, the internal handoff note fails to include the customer's sentiment.”
Frequently asked
Is qwen3.7-max-high good at Customer Support?
qwen3.7-max-high ranks #3 of 43 models we tested for Customer Support, scoring strong.
What is qwen3.7-max-high's strongest Customer Support skill?
Its best sub-task here is Resolution.
This page is Spring Prompt, running
We just did this for every model. Do it for your prompt.
The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.
- Generate test cases from your prompt — no eval set required to start.
- Compare models side by side with quality, cost and latency in one matrix.
- Optimise the winner until the scores say it's ready to ship.
Prompt × model results
12 test cases · 3 evals