Confirm Action

Are you sure you want to proceed?

Is Glm 5.2 good at Customer Support?

Glm 5.2 ranks #12 of 104 for Customer Support — strong. The top pick for this task is gemini-3.1-pro-preview-low.

#12 / 104
Rank for this task
86.9
Score
$0.0170
Cost / run

Glm 5.2 on each Customer Support sub-task

Resolution 100.0/100 #2
Escalation & Handoff 96.5/100 #1
Policy Boundary Test 90.5/100 #16
Help Content Test 90.2/100 #8
Escalation and Incident Test 86.8/100 #43
Policy Boundaries 83.3/100 #81
Basic Support Reply Test 81.4/100 #15
De-escalation 71.5/100 #36

Real examples, graded

WinPassword reset help 89/100

“The response perfectly executes the task, providing clear, actionable steps and a comprehensive security caution. It is concise, professional, and ready to use. The security caution is particularly strong, addressing phishing, password sharing, and password strength.”

WinDuplicate charge (Cedar & Sage) 100/100

“The model perfectly utilizes the provided account facts and policy to resolve the customer's issue. It takes ownership, provides a concrete timeline, and maintains an empathetic tone without inventing any information.”

WinLogin lockout (Lumen) 100/100

“The model perfectly follows the provided policy, acknowledging the urgency of the situation and offering both the 15-minute auto-clear timeline and the immediate manual unlock option. It takes ownership, provides a concrete next step for identity verification, and maintains an empathetic and professional tone without fabricating any account facts or unauthorized promises.”

WeakChurn rescue 0/100

“The model completely failed the task. Instead of writing a reply to the customer, it began analyzing the prompt and then cut off mid-sentence, producing no usable output.”

← Full Glm 5.2 review All Customer Support rankings → Top pick: gemini-3.1-pro-preview-low →

Frequently asked

Is Glm 5.2 good at Customer Support?

Glm 5.2 ranks #12 of 104 models we tested for Customer Support, scoring strong.

What is Glm 5.2's strongest Customer Support skill?

Its best sub-task here is Resolution.

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

  • Generate test cases from your prompt — no eval set required to start.
  • Compare models side by side with quality, cost and latency in one matrix.
  • Optimise the winner until the scores say it's ready to ship.
Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals
Claude Opus
GPT-5
Gemini
v1
7.1
6.8
7.4
v2
8.3
7.9
8.0
v3
9.2
8.6
8.4
Best combo: v3 × Claude Opus
9.2 quality · $0.004/run · 1.8s