Confirm Action

Are you sure you want to proceed?

Business · 10 tasks · 44 models

Best AI models for Translation & Localization

Which models translate and localize accurately — right register, intact placeholders/brands, correct locale formats — without false friends or translationese?

Top models Qwen qwen3.7-max Anthropic claude-opus-4.6-low OpenAI gpt-5.5-high

qwen3.7-max leads Translation & Localization (excellent). For tighter budgets, gpt-5-mini is competitive at about 83% of the cost.

Best overall Excellent
qwen3.7-max

Top score — excellent

100.0 score $0.0142/run 26.2s
Best value Excellent
gpt-5-mini

Clears the quality bar at $0.012/run

90.9 score $0.0118/run 16.7s
Fastest usable Strong
grok-4.20

~11s per run, still strong

87.1 score $0.0136/run 10.6s

Quality vs. cost

Every model placed by what it delivers and what it costs. The best value sits high and to the left.

# Model Score
1 qwen3.7-max 100.0 Excellent
2 claude-opus-4.6-low 100.0 Excellent
3 gpt-5.5-high 99.6 Excellent
4 claude-opus-4.5-low 99.6 Excellent
5 claude-opus-4.6-high 99.6 Excellent
6 gpt-5.5 99.4 Excellent
7 gpt-5.5-low 98.9 Excellent
8 claude-sonnet-4.6-low 98.9 Excellent

What separates the top models

Business Translation

medium

Tests translating marketing/UI copy while preserving meaning, brand names, and placeholders, in natural (non-literal) target language.

Leader: gemini-3.1-flash-lite

Localization

hard

Tests adapting dates, numbers, currency, and units to the target locale (converted, not relabeled).

Leader: claude-sonnet-4.6-low

Register & Formality

hard

Tests using the correct, consistent formality (T-V / keigo direction) for a business context.

Leader: gpt-5-mini

Catch the Translation Error

hard

Tests finding, classifying, and fixing errors in a flawed translation — false friends, register, broken placeholders, locale formats.

Leader: qwen3.7-max-high

See the difference

Real model outputs and how our judge scored them.

Top score claude-haiku-4.5 Find the register error (EN→German translation)
100/100

“The model successfully caught the planted register error, classified it correctly, and provided an accurate and natural fix without hallucinating other errors.”

# Analysis

**Issue Identified:** ❌ Not appropriate for formal B2B

The translation uses **"du" (informal you)**, which is incorrect for formal business communication in German.

## Corrected Version

**"Könnten Sie bitte Ihren Termin bestätigen?"**

Or alternatively:

**"Können Sie bitte Ihren Termin bestätigen?"**

## Key Differences

| Aspect | Inappropriate | Appropriate |
|--------|---------------|-------------|
| Pronoun | du/dein | Sie/Ihr |
| Tone | Casual, friendly | Professional, respectful |
| Context | Colleagues/friends | Business clients, partners |

## Why This Matters

In German business communication, **"Sie"** is the standard formal address. Using "du" in B2B contexts can appear unprofessional or disrespectful unless there's an established informal relationship. The corrected version maintains appropriate professional distance and respect.
Mid-pack kimi-k2.5 Support reply with a false-friend trap (EN→German)
100/100

“The translation is perfectly accurate, fluent, and correctly captures the tone and meaning of the source text without using false friends. The additional notes provided by the model are accurate and add value.”

 **Die Verzögerung ist uns peinlich, und wir werden Ihre Rückerstattung heute noch überweisen.**

**Important note on "embarrassed":**
Avoid translating this as **"verlegen"** — while "verlegen" can mean "embarrassed," it also means "delayed" or "misplaced" in German, which would create unfortunate confusion in this context!

**Alternative options depending on tone:**
- More formal: *"Die Verzögerung ist uns höchst unangenehm..."* (The delay is extremely unpleasant to us...)
- More apologetic: *"Es ist uns peinlich, dass es zu dieser Verzögerung gekommen ist..."* (It is embarrassing to us that this delay occurred...)

**Note:** German business communication often uses "bedauern" (regret) or "entschuldigen" (apologize) instead, but "peinlich" accurately conveys the specific sentiment of embarrassment/awkwardness about the situation.
Lowest score grok-4.20 Find the errors (EN→Spanish translation)
30/100

“While the model successfully caught the critical false friend 'embarazada', it failed the task by missing the other two major planted errors: the broken placeholder and the unlocalized date/currency formats. Additionally, it over-edited the text by flagging grammatically correct and natural phrasing as errors.”

**Error Analysis**

### 1. **Critical Lexical Error (False Friend)**
- **Source:** "I was embarrassed."
- **Translation:** "Estaba embarazada."
- **Classification:** False friend / Lexical error (Mistranslation)
- **Explanation:** "Embarrassed" means *avergonzado/a* (feeling shame). "Embarazada" means *pregnant*. This is a classic and very serious false friend in Spanish.
- **Fix:** "Me sentí avergonzado/a." or "Estaba avergonzado/a."

### 2. **Verb Tense / Aspect Error**
- **Source:** "I was embarrassed."
- **Translation:** "Estaba embarazada."
- **Classification:** Tense/Aspect error
- **Explanation:** The English uses a stative past ("I was embarrassed") describing a feeling at a specific moment. The Spanish "Estaba" (imperfect) is not wrong grammatically, but the most natural equivalent in this context is the preterite or a reflexive construction.
- **Fix:** "Me sentí avergonzado/a." (most natural) or "Estuve avergonzado/a."

### 3. **Agreement / Gender Error**
- **Source:** "Welcome, {name}!"
- **Translation:** "¡Bienvenido, {nombre}!"
- **Classification:** Gender agreement error
- **Explanation:** "Bienvenido" is masculine. The placeholder `{name}` can refer to any gender. In Spanish customer communications, the inclusive/neutral form is preferred.
- **Fix:** "¡Bienvenido/a, {nombre}!" or better: "**Te damos la bienvenida**, {nombre}!" (avoids gendering the adjective entirely).

### 4. **Minor Stylistic / Naturalness Issue**
- **Source:** "Your order of $1,500 ships on 03/04/2025."
- **Translation:** "Su pedido de $1,500 se envía el 03/04/2025."
- **Classification:** Register/Naturalness issue
- **Explanation:** While grammatically correct, "se envía" sounds slightly impersonal. In customer service Spanish, it's more common and natural to use the active voice or a

Where models still fail

The most common problems we flagged across all models.

55source locale format 38omission or addition 15translationese 9broken placeholder 8mistranslation 6wrong register

Frequently asked

What is the best AI model for translation & localization?

In our benchmarks, qwen3.7-max ranks first for translation & localization, scoring excellent, across 10 test cases.

What is the cheapest good model for translation & localization?

gpt-5-mini is the best value: it clears our quality bar for translation & localization at $0.012 per run.

Which model is fastest for translation & localization?

grok-4.20 is the fastest model that still performs well for translation & localization.

How we test

Each model output is scored by a strict JSON LLM judge, supported by deterministic heuristics, then normalized to a 0-100 score.

Judge: gemini-3.1-pro-preview · 460 model runs across 4 benchmarks · last tested 2026-06-29

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

  • Generate test cases from your prompt — no eval set required to start.
  • Compare models side by side with quality, cost and latency in one matrix.
  • Optimise the winner until the scores say it's ready to ship.
Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals
Claude Opus
GPT-5
Gemini
v1
7.1
6.8
7.4
v2
8.3
7.9
8.0
v3
9.2
8.6
8.4
Best combo: v3 × Claude Opus
9.2 quality · $0.004/run · 1.8s