Business · 6 tasks · 93 models

Best AI models for Frontend & Landing Pages

Name: Frontend & Landing Pages AI model benchmark
Creator: Spring Prompt

Which models build landing pages that actually look designed, convert, and ship — not just valid HTML?

Top models Google

gemini-3-flash-preview-max Anthropic

claude-opus-4.5-medium Qwen

qwen3.5-plus-02-15

gemini-3-flash-preview-max leads Frontend & Landing Pages (usable). For tighter budgets, gemini-3.1-flash-lite-medium is competitive at about 29% of the cost.

Best overall Usable

gemini-3-flash-preview-max

Top score — usable

73.0 score $0.0109/run 25.2s

Best value Usable

gemini-3.1-flash-lite-medium

Clears the quality bar at $3.19/1k/run

70.0 score $0.0032/run 12.2s

Quality vs. cost

Every model placed by what it delivers and what it costs. The best value sits high and to the left.

Full ranking

Best overall Cheapest Fastest Smartest

#	Model	Score	Cost/run	Speed	Best for	Arena · UI
1	gemini-3-flash-preview-max	73.0 Usable	$0.0109	25.2s	Needs review	—
2	claude-opus-4.5-medium	72.3 Usable	$0.2514	103.7s	Needs review	1296
3	qwen3.5-plus-02-15	71.5 Usable	$0.0083	97.6s	Needs review	—
4	gemini-3.5-flash-max	71.3 Usable	$0.0815	47.9s	Needs review	1315
5	claude-haiku-4.5-max	70.8 Usable	$0.0682	78.0s	Needs review	1155
6	gemini-3.1-flash-lite-medium	70.0 Usable	$0.0032	12.2s	Needs review	—
7	gemini-3-flash-preview	69.8 Needs editing	$0.0103	19.3s	Needs review	—
8	gpt-5-mini-medium	68.2 Needs editing	$0.0128	78.2s	Needs review	—
9	gpt-5.5-medium	67.7 Needs editing	$0.2021	68.4s	Needs review	1300
10	qwen3.7-max-medium	67.5 Needs editing	$0.0307	150.0s	Needs review	1327
11	grok-4.20	67.3 Needs editing	$0.0119	20.4s	Needs review	—
12	gemini-3.1-flash-lite	67.2 Needs editing	$0.0020	5.0s	Needs review	—
13	claude-opus-4.5	67.2 Needs editing	$0.1651	60.3s	Needs review	1296
14	claude-opus-4.8-medium	66.5 Needs editing	$0.1504	61.6s	Needs review	1285
15	gpt-5.4	66.3 Needs editing	$0.0749	31.5s	Needs review	1258
16	deepseek-v3.2-medium	65.8 Needs editing	$0.0012	94.5s	Needs review	1203
17	kimi-k2.5-max	65.8 Needs editing	$0.0134	135.3s	Needs review	—
18	gemini-3-flash-preview-medium	65.5 Needs editing	$0.0101	20.8s	Needs review	—
19	claude-sonnet-4.5	65.5 Needs editing	$0.0503	39.3s	Needs review	1215
20	gpt-5-mini-max	65.3 Needs editing	$0.0225	130.8s	Needs review	—
21	grok-4.20-beta	65.2 Needs editing	$0.0266	27.6s	Needs review	1245
22	deepseek-v3.1-terminus	65.0 Needs editing	$0.0032	122.6s	Needs review	1237
23	gemini-3.1-pro-preview-medium	64.8 Needs editing	$0.0648	48.9s	Needs review	1284
24	mistral-medium-3.1	64.7 Needs editing	$0.0064	24.2s	Needs review	1154
25	mistral-medium-3.1-max	64.7 Needs editing	$0.0074	31.7s	Needs review	1154
26	gpt-5.4-mini	64.7 Needs editing	$0.0183	23.4s	Needs review	—
27	claude-sonnet-4.5-medium	64.7 Needs editing	$0.0558	51.1s	Needs review	1215
28	mistral-medium-3.1-medium	64.3 Needs editing	$0.0055	28.4s	Needs review	1154
29	deepseek-v3.2-max	64.2 Needs editing	$0.0012	92.7s	Needs review	1203
30	claude-haiku-4.5-medium	64.0 Needs editing	$0.0512	57.0s	Needs review	1155
31	claude-sonnet-4.5-max	64.0 Needs editing	$0.0628	53.1s	Needs review	1215
32	deepseek-v3.2	63.5 Needs editing	$0.0013	67.7s	Needs review	1203
33	claude-opus-4.8-high	62.9 Needs editing	$0.1677	64.8s	Needs review	1285
34	gemini-3.1-pro-preview-low	62.5 Needs editing	$0.0687	51.6s	Needs review	1284
35	deepseek-v3.1-terminus-medium	62.3 Needs editing	$0.0036	126.1s	Needs review	1237
36	gpt-5.5-high	61.2 Needs editing	$0.2108	82.0s	Needs review	1300
37	gpt-5.4-medium	61.0 Needs editing	$0.1267	67.4s	Needs review	1258
38	gpt-5.5-max	61.0 Needs editing	$0.2000	74.2s	Needs review	1300
39	claude-opus-4.6	60.7 Needs editing	$0.1570	73.6s	Needs review	1332
40	gpt-5.5	60.3 Needs editing	$0.1834	75.8s	Needs review	1300
41	kimi-k2.7-code-max	60.2 Needs editing	$0.0317	127.6s	Needs review	1300
42	claude-opus-4.8-low	60.2 Needs editing	$0.1244	49.0s	Needs review	1285
43	gpt-5.4-mini-medium	60.0 Needs editing	$0.0377	56.2s	Needs review	—
44	qwen3.7-max	59.8 Weak	$0.0270	128.6s	Needs review	1327
45	gemini-3.1-pro-preview-max	59.7 Weak	$0.0659	52.1s	Needs review	1284
46	claude-opus-4.5-max	59.7 Weak	$0.2252	90.0s	Needs review	1296
47	deepseek-v3.1-terminus-max	59.5 Weak	$0.0033	111.7s	Needs review	1237
48	gemini-3.1-pro-preview	59.5 Weak	$0.0878	55.4s	Needs review	1284
49	claude-opus-4.6-medium	59.3 Weak	$0.1994	103.1s	Needs review	1332
50	claude-opus-4.6-high	59.3 Weak	$0.2269	112.3s	Needs review	1332
51	kimi-k2.5-medium	59.2 Weak	$0.0175	174.7s	Needs review	—
52	gemini-3.1-pro-preview-high	59.2 Weak	$0.0678	46.1s	Needs review	1284
53	qwen3.5-plus-02-15-max	59.0 Weak	$0.0089	114.9s	Needs review	—
54	claude-opus-4.6-max	59.0 Weak	$0.2038	99.2s	Needs review	1332
55	gemini-3.5-flash-low	58.6 Weak	$0.0563	31.3s	Needs review	1315
56	grok-4.20-medium	58.2 Weak	$0.0120	25.9s	Needs review	—
57	deepseek-v3.2-high	57.8 Weak	$0.0012	67.0s	Needs review	1203
58	kimi-k2.5	57.8 Weak	$0.0148	181.0s	Needs review	—
59	grok-4.20-beta-max	56.8 Weak	$0.0321	35.1s	Needs review	1245
60	glm-5	54.2 Weak	$0.0119	93.0s	Needs review	1287
61	qwen3.7-max-max	53.8 Weak	$0.0244	122.8s	Needs review	1327
62	qwen3.5-plus-02-15-medium	53.7 Weak	$0.0079	93.1s	Needs review	—
63	grok-4.20-max	53.3 Weak	$0.0109	20.5s	Needs review	—
64	gemini-3.5-flash-medium	53.2 Weak	$0.0570	34.4s	Needs review	1315
65	claude-opus-4.5-high	52.7 Weak	$0.2281	87.1s	Needs review	1296
66	claude-sonnet-4.6-medium	52.7 Weak	$0.2403	208.6s	Needs review	1324
67	grok-4.20-beta-medium	51.7 Weak	$0.0263	26.2s	Needs review	1245
68	gemini-3.5-flash-high	51.3 Weak	$0.0792	43.8s	Needs review	1315
69	kimi-k2.7-code	50.0 Weak	$0.0281	140.6s	Needs review	1300
70	minimax-m2.7-max	49.7 Weak	$0.0083	107.4s	Needs review	1261
71	glm-5-medium	47.5 Weak	$0.0178	223.4s	Needs review	1287
72	kimi-k2.7-code-medium	46.8 Weak	$0.0246	105.1s	Needs review	1300
73	minimax-m2.7	46.7 Weak	$0.0074	54.5s	Needs review	1261
74	gemini-3.1-flash-lite-max	45.5 Weak	$0.0031	15.0s	Needs review	—
75	minimax-m2.7-medium	36.8 Failed	$0.0072	86.9s	Needs review	1261
76	deepseek-v3.2-low	35.0 Failed	$0.0013	134.7s	Needs review	1203
77	gpt-5.4-low	35.0 Failed	$0.0809	32.6s	Needs review	1258
78	gpt-5.5-low	35.0 Failed	$0.1476	51.3s	Needs review	1300
79	gpt-5-mini	34.7 Failed	$0.0131	63.8s	Needs review	—
80	claude-haiku-4.5	34.7 Failed	$0.0238	25.5s	Needs review	1155
81	qwen3.7-max-high	34.7 Failed	$0.0532	254.9s	Needs review	1327
82	claude-sonnet-4.5-low	34.7 Failed	$0.0539	42.7s	Needs review	1215
83	claude-sonnet-4.5-high	34.7 Failed	$0.0600	49.3s	Needs review	1215
84	claude-opus-4.6-low	34.7 Failed	$0.1819	84.9s	Needs review	1332
85	claude-sonnet-4.6-high	34.7 Failed	$0.3623	299.1s	Needs review	1324
86	claude-opus-4.5-low	34.3 Failed	$0.1777	70.9s	Needs review	1296
87	claude-sonnet-4.6-low	33.7 Failed	$0.1214	92.3s	Needs review	1324
88	qwen3.7-max-low	33.0 Failed	$0.0217	103.7s	Needs review	1327
89	gpt-5.4-max	71.0 Usable	$0.1599	104.6s	Needs review	1258
90	claude-sonnet-4.6-max	64.8 Needs editing	$0.3609	299.3s	Needs review	1324
91	glm-5-max	49.8 Weak	$0.0202	343.1s	Needs review	1287
92	gpt-5.4-high	42.6 Weak	$0.1711	88.9s	Needs review	1258
93	gpt-5.4-mini-max	33.0 Failed	$0.0528	90.0s	Needs review	—

“Arena · UI” is a third-party benchmark shown for context — independent of our tests. Source: Artificial Analysis (artificialanalysis.ai) via OpenRouter (openrouter.ai/rankings). · Source: Design Arena (www.designarena.ai) via OpenRouter (openrouter.ai/rankings).

What separates the top models

Landing Page Build

medium

Build a complete, self-contained, responsive landing page from a product brief — hero, value props, social proof, and one primary CTA.

Leader: claude-opus-4.8-medium

Pricing Page Build

hard

Build a self-contained, responsive pricing section with tiers and a highlighted recommended plan — clear hierarchy, honest copy, no invented prices.

Leader: gpt-5.5-medium

See the difference

Real model outputs and how our judge scored them.

Top score gpt-5.5-medium Ferrovia — enterprise pricing tiers

80/100

“Clean, professional design with a distinct recommended tier and disciplined typography. Copy is precise, benefit-led, and adheres strictly to constraints, using 'Custom' pricing appropriately. The mobile layout exhibits horizontal overflow, cutting off text.”

```html
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <title>Ferrovia Analytics Pricing</title>
  <style>
    :root {
      --bg: #f6f8fb;
      --surface: #ffffff;
      --surface-strong: #f9fbff;
      --text: #142033;
      --muted: #5d6b82;
      --muted-2: #78869a;
      --line: #d9e0ea;
      --accent: #1f5eff;
      --accent-dark: #1748c7;
      --accent-soft: #eaf0ff;
      --success: #147a5c;
      --shadow: 0 18px 50px rgba(20, 32, 51, 0.10);
      --radius: 22px;
      --max: 1180px;
    }

    * {
      box-sizing: border-box;
    }

    body {
      margin: 0;
      font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Arial, sans-serif;
      color: var(--text);
      background:
        radial-gradient(circle at top left, rgba(31, 94, 255, 0.10), transparent 32rem),
        linear-gradient(180deg, #ffffff 0%, var(--bg) 44%, #eef2f7 100%);
      line-height: 1.5;
    }

    .pricing-section {
      padding: clamp(3.5rem, 6vw, 6rem) 1.25rem;
    }

    .pricing-wrap {
      max-width: var(--max);
      margin: 0 auto;
    }

    .eyebrow {
      display: inline-flex;
      align-items: center;
      gap: 0.5rem;
      margin: 0 0 1rem;
      padding: 0.45rem 0.75rem;
      border: 1px solid var(--line);
      border-radius: 999px;
      background: rgba(255, 255, 255, 0.72);
      color: var(--muted);
      font-size: 0.875rem;
      font-weight: 700;
      letter-spacing: 0.01em;
    }

    .eyebrow-dot {
      width: 0.55rem;
      height: 0.55rem;
      border-radius: 50%;
      background: var(--success);
      box-shadow: 0 0 0 4px rgba(20, 122, 92, 0.12);
    }

    .section-header {
      display: grid;

Mid-pack minimax-m2.7 Cedar & Sage — DTC product page

70/100

“Visual design fails due to a severely broken, non-responsive mobile layout with horizontal overflow. Hero copy is clear and benefit-led, but the 'customer' section awkwardly repeats product features instead of providing testimonials. Grounding is excellent, strictly adhering to provided facts without fabrication.”

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Cedar & Sage – Refillable Home Cleaning</title>
<meta name="description" content="Cedar & Sage delivers refillable aluminium bottles and concentrated cleaning tablets. Add water at home, cut shipping weight, and skip single‑use plastic.">
<style>
:root {
  --color-bg: #f9f5f0;
  --color-bg-alt: #f0ebe3;
  --color-text: #2b2b2b;
  --color-primary: #2c5e4f;
  --color-accent: #d68c45;
  --color-light: #ffffff;
  --color-muted: #5a5a5a;
  --font-sans: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
  --font-serif: Georgia, "Times New Roman", serif;
  --spacing: 1rem;
  --max-width: 1200px;
  --radius: 6px;
}
*, *::before, *::after {
  box-sizing: border-box;
  margin: 0;
  padding: 0;
}
html {
  scroll-behavior: smooth;
}
body {
  font-family: var(--font-sans);
  background-color: var(--color-bg);
  color: var(--color-text);
  line-height: 1.6;
  font-size: 1rem;
}
.skip-link {
  position: absolute;
  top: -40px;
  left: 0;
  background: var(--color-primary);
  color: var(--color-light);
  padding: 0.5rem 1rem;
  text-decoration: none;
  z-index: 200;
  transition: top 0.2s;
}
.skip-link:focus {
  top: 0;
}
.container {
  width: 100%;
  max-width: var(--max-width);
  margin: 0 auto;
  padding: 0 var(--spacing);
}
img, svg {
  max-width: 100%;
  display: block;
}

/* Header */
header {
  background: var(--color-bg);
  padding: 1rem 0;
  border-bottom: 1px solid #e0dbd3;
}
.logo {
  font-family: var(--font-serif);
  font-size: 1.5rem;
  color: var(--color-primary);
  font-weight: bold;
  letter-spacing: 0.02em;
}

/* Hero */
.hero {
  background-color: var(--color-primary);
  color: var(--color

Lowest score glm-5 Glow & Grain — local bakery site (non-tech)

0/100

The

Where models still fail

The most common problems we flagged across all models.

85mobile horizontal overflow 32Mobile layout broken (horizontal overflow) 29mobile layout broken 25broken mobile layout 17horizontal overflow 14mobile text overflow 13Mobile horizontal overflow 12mobile-horizontal-overflow

Frequently asked

What is the best AI model for frontend & landing pages?

In our benchmarks, gemini-3-flash-preview-max ranks first for frontend & landing pages, scoring usable, across 6 test cases.

What is the cheapest good model for frontend & landing pages?

gemini-3.1-flash-lite-medium is the best value: it clears our quality bar for frontend & landing pages at $3.19/1k per run.

Which model is fastest for frontend & landing pages?

gemini-3.1-flash-lite-medium is the fastest model that still performs well for frontend & landing pages.

How we test

Each model output is scored by a strict JSON LLM judge, supported by deterministic heuristics, then normalized to a 0-100 score.

Judge: gemini-3.1-pro-preview · 582 model runs across 2 benchmarks · last tested 2026-06-30

This page is Spring Prompt, running

We just did this for every model. Do it for your prompt.

The rankings above come from running real tasks through real models and scoring every output. Spring Prompt is that same engine — pointed at your prompt, your test cases, and your definition of good.

Generate test cases from your prompt — no eval set required to start.
Compare models side by side with quality, cost and latency in one matrix.
Optimise the winner until the scores say it's ready to ship.

Join the waitlist Browse all benchmarks

Experiment · Cold outreach email

Prompt × model results

12 test cases · 3 evals

Claude Opus

GPT-5

Gemini

7.1

6.8

7.4

8.3

7.9

8.0

9.2 ★

8.6

8.4

Best combo: v3 × Claude Opus

9.2 quality · $0.004/run · 1.8s