Confirm Action

Are you sure you want to proceed?

Spring Prompt · Live AI Benchmark

BulletBench

How smart is an AI per second? We make AI models play speed chess against a chess computer, on a real clock: every second a model spends producing its move drains its time, and when the clock hits zero it loses - even from a winning position. Fast and good wins. Slow genius flags.

The field - fastest thinkers first

Gemini 3.5 Flash Gemini 3.5 Flash
Gemini 3.1 Flash Lite Gemini 3.1 Flash Lite
Gemini 3 Flash Gemini 3 Flash
Qwen 3.5 Flash Qwen 3.5 Flash
Gemma 4 26B (A4B) Gemma 4 26B (A4B)
Claude Opus 4.8 Claude Opus 4.8
Mistral Small 4 Mistral Small 4
Ministral 14B Ministral 14B

Bullet (60s) standings

2166 games
🥇 Gemini 3.5 Flash 1.3s/mv 900
🥈 Gemini 3.1 Flash Lite 0.9s/mv 863
🥉 Gemini 3 Flash 1.4s/mv 800
4 Qwen 3.5 Flash 0.5s/mv 589
5 Gemma 4 26B (A4B) 1.0s/mv 426
6 Claude Opus 4.8 1.7s/mv 369

Chess rating at the 60-second bullet format. Full board below.

The fastest LLMs, ranked: who can actually think fast?

This is the live leaderboard for low-latency AI: the two fastest formats mirror the jobs where a model must be smart right now (routing requests, classifying, quick decisions in real-time agents): Bullet gives a model 60 seconds of thinking for an entire game. Lightning gives it 10 seconds plus 1 extra second per move - it can play forever, but only if it answers in about a second. Ratings work like human chess ratings: higher is stronger (a club player is ~1500, a random-move player ~400). Click any column header to sort.

Model Bullet rating 60s whole game Lightning rating 10s + 1s per move Response time median secs per move Output speed effective tokens/sec Time losses % games lost on time Cost avg $ per game

Qwen 3.5 Flash

Alibaba via OpenRouter · off reasoning

589 0.5s/move $0.00/game
558 0.4s/move $0.00/game
0.4s 5 t/s 0% 0/64 $0.00

Mercury 2 (Inception)

Inception via OpenRouter · off reasoning

159 0.5s/move $0.01/game ⏱ 18/32
342 0.5s/move $0.01/game ⏱ 3/32
0.5s 53 t/s 33% 21/64 $0.01

Ministral 3 3B

Mistral via OpenRouter · off reasoning

0 0.7s/move $0.00/game ⏱ 26/32
0 0.8s/move $0.01/game ⏱ 17/30
0.7s 4 t/s 69% 43/62 $0.01

Ministral 14B

Mistral via OpenRouter · off reasoning

220 0.8s/move $0.01/game ⏱ 16/31
444 0.8s/move $0.01/game ⏱ 5/32
0.8s 4 t/s 33% 21/63 $0.01

Mistral Small 4

Mistral via OpenRouter · off reasoning

348 0.8s/move $0.00/game ⏱ 11/32
555 0.8s/move $0.00/game ⏱ 1/30
0.8s 4 t/s 19% 12/62 $0.00

Gemini 3.1 Flash Lite

Google · off reasoning

863 0.9s/move $0.01/game ⏱ 8/32
876 0.8s/move $0.01/game
0.9s 3 t/s 12% 8/64 $0.01

Nova Micro

Amazon via OpenRouter · off reasoning

0 0.8s/move $0.00/game ⏱ 29/32
348 0.9s/move $0.00/game ⏱ 1/30
0.8s 4 t/s 48% 30/62 $0.00

Nemotron 3 Nano 30B-A3B

NVIDIA via OpenRouter · off reasoning

0 0.9s/move $0.00/game ⏱ 26/32
356 0.9s/move $0.00/game ⏱ 8/30
0.9s 6 t/s 55% 34/62 $0.00

LFM-2 24B-A2B (Liquid)

Liquid AI via OpenRouter · off reasoning

0 0.8s/move $0.00/game ⏱ 27/32
223 0.9s/move $0.00/game ⏱ 11/30
0.9s 6 t/s 61% 38/62 $0.00

Gemma 4 26B (A4B)

Google via OpenRouter · off reasoning

426 1.0s/move $0.00/game ⏱ 12/32
193 0.8s/move $0.00/game ⏱ 23/32
0.9s 3 t/s 55% 35/64 $0.00

Nova 2.0 Lite

Amazon via OpenRouter · off reasoning

219 1.0s/move $0.00/game ⏱ 18/32
16 1.1s/move $0.00/game ⏱ 23/29
1.0s 4 t/s 67% 41/61 $0.00

Claude Haiku 4.5

Anthropic · off reasoning

219 1.2s/move $0.02/game ⏱ 21/32
153 1.2s/move $0.01/game ⏱ 25/32
1.2s 9 t/s 72% 46/64 $0.01

Gemini 3.5 Flash

Google · off reasoning

900 1.3s/move $0.03/game ⏱ 9/32
536 1.3s/move $0.01/game ⏱ 18/32
1.3s 2 t/s 42% 27/64 $0.02

Qwen 3.5 9B

Alibaba via OpenRouter · off reasoning

145 1.5s/move $0.00/game ⏱ 24/32
≤0 1.4s/move $0.00/game ⏱ 32/32
1.5s 2 t/s 88% 56/64 $0.00

Gemini 3 Flash

Google · off reasoning

800 1.4s/move $0.01/game ⏱ 13/32
318 1.4s/move $0.00/game ⏱ 23/32
1.4s 2 t/s 56% 36/64 $0.01

Claude Opus 4.8

Anthropic · low reasoning heavy

369 1.7s/move $0.09/game ⏱ 16/32
≤0 1.7s/move $0.03/game ⏱ 16/16
1.7s 3 t/s 67% 32/48 $0.06

GPT-5.4 mini

OpenAI · off reasoning

100 2.7s/move $0.02/game ⏱ 14/16
≤0 1.7s/move $0.01/game ⏱ 16/16
2.2s 67 t/s 94% 30/32 $0.02

GPT-5.4 nano

OpenAI · off reasoning

0 5.2s/move $0.01/game ⏱ 15/16
≤0 2.2s/move $0.00/game ⏱ 16/16
3.7s 92 t/s 97% 31/32 $0.01

Gemini 3.1 Pro

Google · low reasoning heavy

114 5.1s/move $0.03/game ⏱ 14/16
≤0 3.6s/move $0.01/game ⏱ 16/16
4.4s 31 t/s 94% 30/32 $0.02

Claude Fable 5

Anthropic · low reasoning heavy

≤0 4.7s/move $0.11/game ⏱ 16/16
≤0 4.5s/move $0.02/game ⏱ 16/16
4.6s 8 t/s 100% 32/32 $0.06

GPT-5.5

OpenAI · medium reasoning heavy

114 4.6s/move $0.09/game ⏱ 14/16
0 2.3s/move $0.02/game ⏱ 15/16
3.4s 34 t/s 91% 29/32 $0.05

Qwen 3.7 Max

Alibaba via OpenRouter · default reasoning heavy

0 7.4s/move $0.01/game ⏱ 15/16
≤0 3.6s/move $0.00/game ⏱ 16/16
5.5s 42 t/s 97% 31/32 $0.01

Kimi K2.5

Moonshot via OpenRouter · default reasoning heavy

≤0 18.6s/move $0.01/game ⏱ 16/16
≤0 5.9s/move $0.00/game ⏱ 16/16
12.2s 48 t/s 100% 32/32 $0.00
springprompt.com/evals/bullet-chess

Elo cells: greener = stronger play, redder = clock death. Flag rate counts games lost on time across both fast controls.

Gemini 3.1 Flash Lite is the only class of model that can sustain ~1-second chess: ladder Elo 876 at the 10+1 lightning control, 0.8s per move.

All 6 frontier heavyweight configurations lost essentially every lightning game on time - including positions they were winning on the board.

Gemini 3.1 Pro gains 481 ladder Elo when the clock moves from blitz (3 min) to rapid (10 min) - intelligence that only exists when you can afford to wait for it.

Qwen 3.7 Max gains 408 ladder Elo when the clock moves from blitz (3 min) to rapid (10 min) - intelligence that only exists when you can afford to wait for it.

The full clock spectrum

Context for the fast board: the same models with room to think - Blitz gives 3 minutes per game, Rapid gives 10. This is where the big reasoning models climb the table, and where you can see exactly what each extra second of time budget buys. Click any column to sort.

Model Lightning 10+1s Bullet 60s Blitz 180s Rapid 600s

Qwen 3.5 Flash

0.4s/move · off reasoning

558 0.4s/move $0.00/game
589 0.5s/move $0.00/game
558 0.4s/move $0.00/game
529 0.4s/move $0.00/game

Mercury 2 (Inception)

0.5s/move · off reasoning

342 0.5s/move $0.01/game ⏱ 3/32
159 0.5s/move $0.01/game ⏱ 18/32
393 0.5s/move $0.02/game
445 0.5s/move $0.01/game

Ministral 3 3B

0.8s/move · off reasoning

0 0.8s/move $0.01/game ⏱ 17/30
0 0.7s/move $0.00/game ⏱ 26/32
23 0.8s/move $0.01/game ⏱ 1/32
16 0.9s/move $0.01/game

Ministral 14B

0.8s/move · off reasoning

444 0.8s/move $0.01/game ⏱ 5/32
220 0.8s/move $0.01/game ⏱ 16/31
499 0.7s/move $0.01/game
529 0.9s/move $0.01/game

Mistral Small 4

0.8s/move · off reasoning

555 0.8s/move $0.00/game ⏱ 1/30
348 0.8s/move $0.00/game ⏱ 11/32
543 0.8s/move $0.01/game
488 0.8s/move $0.01/game

Gemini 3.1 Flash Lite

0.9s/move · off reasoning

876 0.8s/move $0.01/game
863 0.9s/move $0.01/game ⏱ 8/32
827 0.9s/move $0.01/game
762 0.8s/move $0.01/game

Nova Micro

0.9s/move · off reasoning

348 0.9s/move $0.00/game ⏱ 1/30
0 0.8s/move $0.00/game ⏱ 29/32
373 0.8s/move $0.00/game
529 0.9s/move $0.00/game

Nemotron 3 Nano 30B-A3B

0.9s/move · off reasoning

356 0.9s/move $0.00/game ⏱ 8/30
0 0.9s/move $0.00/game ⏱ 26/32
372 0.9s/move $0.00/game ⏱ 2/32
572 0.9s/move $0.00/game

LFM-2 24B-A2B (Liquid)

0.9s/move · off reasoning

223 0.9s/move $0.00/game ⏱ 11/30
0 0.8s/move $0.00/game ⏱ 27/32
293 0.9s/move $0.00/game ⏱ 1/32
377 0.9s/move $0.00/game

Gemma 4 26B (A4B)

0.9s/move · off reasoning

193 0.8s/move $0.00/game ⏱ 23/32
426 1.0s/move $0.00/game ⏱ 12/32
529 1.0s/move $0.00/game ⏱ 2/32
571 0.9s/move $0.00/game

Nova 2.0 Lite

1.0s/move · off reasoning

16 1.1s/move $0.00/game ⏱ 23/29
219 1.0s/move $0.00/game ⏱ 18/32
486 1.0s/move $0.00/game
453 1.0s/move $0.00/game

Claude Haiku 4.5

1.2s/move · off reasoning

153 1.2s/move $0.01/game ⏱ 25/32
219 1.2s/move $0.02/game ⏱ 21/32
544 1.2s/move $0.02/game ⏱ 1/32
488 1.1s/move $0.03/game

Gemini 3.5 Flash

1.3s/move · off reasoning

536 1.3s/move $0.01/game ⏱ 18/32
900 1.3s/move $0.03/game ⏱ 9/32
915 1.3s/move $0.04/game
1086 1.3s/move $0.04/game

Qwen 3.5 9B

1.4s/move · off reasoning

≤0 1.4s/move $0.00/game ⏱ 32/32
145 1.5s/move $0.00/game ⏱ 24/32
499 1.4s/move $0.00/game ⏱ 3/32
529 1.2s/move $0.00/game

Gemini 3 Flash

1.4s/move · off reasoning

318 1.4s/move $0.00/game ⏱ 23/32
800 1.4s/move $0.01/game ⏱ 13/32
949 1.4s/move $0.01/game
745 1.4s/move $0.01/game

Claude Opus 4.8

1.7s/move · low reasoning heavy

≤0 1.7s/move $0.03/game ⏱ 16/16
369 1.7s/move $0.09/game ⏱ 16/32
559 1.7s/move $0.13/game ⏱ 3/32
571 1.7s/move $0.13/game

GPT-5.4 mini

3.2s/move · off reasoning

≤0 1.7s/move $0.01/game ⏱ 16/16
100 2.7s/move $0.02/game ⏱ 14/16
352 3.9s/move $0.06/game ⏱ 12/24
571 4.7s/move $0.10/game

GPT-5.4 nano

5.4s/move · off reasoning

≤0 2.2s/move $0.00/game ⏱ 16/16
0 5.2s/move $0.01/game ⏱ 15/16
225 7.1s/move $0.02/game ⏱ 18/24
572 7.0s/move $0.03/game

Gemini 3.1 Pro

5.8s/move · low reasoning heavy

≤0 3.6s/move $0.01/game ⏱ 16/16
114 5.1s/move $0.03/game ⏱ 14/16
683 6.8s/move $0.11/game ⏱ 13/24
1164 7.7s/move $0.38/game ⏱ 2/12

Claude Fable 5

6.2s/move · low reasoning heavy

≤0 4.5s/move $0.02/game ⏱ 16/16
≤0 4.7s/move $0.11/game ⏱ 16/16
505 6.1s/move $0.36/game ⏱ 13/24
572 9.4s/move $1.07/game ⏱ 5/12

GPT-5.5

7.2s/move · medium reasoning heavy

0 2.3s/move $0.02/game ⏱ 15/16
114 4.6s/move $0.09/game ⏱ 14/16
174 9.6s/move $0.24/game ⏱ 20/24
409 12.3s/move $0.69/game ⏱ 8/12

Qwen 3.7 Max

9.6s/move · default reasoning heavy

≤0 3.6s/move $0.00/game ⏱ 16/16
0 7.4s/move $0.01/game ⏱ 15/16
164 12.0s/move $0.04/game ⏱ 20/24
572 15.3s/move $0.10/game ⏱ 4/12

Kimi K2.5

26.2s/move · default reasoning heavy

≤0 5.9s/move $0.00/game ⏱ 16/16
≤0 18.6s/move $0.01/game ⏱ 16/16
≤0 23.9s/move $0.03/game ⏱ 16/16
33 56.6s/move $0.07/game ⏱ 11/12
springprompt.com/evals/bullet-chess

Ladder Elo: anchored to a Stockfish skill ladder (random mover = 400). Internally consistent ordering, not FIDE-calibrated. Sub-400 values at fast controls mean the model died on the clock, not that it plays worse than random chess. ⏱ n/m = games lost on time.

Watch the games

Full games with per-move API latency. Step through and watch the clock do its work.

gemini-3-flash (black) vs Stockfish L3 (~1200)

60s clock · 54 plies · 41.6s thinking · $0.01

win · checkmate

Move

-

gemini-3.1-flash-lite (white) vs Stockfish L2 (~1000)

60s clock · 35 plies · 16.1s thinking · $0.00

win · checkmate

Move

-

gemini-3.1-pro (white) vs Stockfish L2 (~1000)

180s clock · 37 plies · 159.0s thinking · $0.12

win · checkmate

Move

-

lfm-2-24b (black) vs Stockfish L0 (~400)

180s clock · 199 plies · 180.4s thinking · $0.00

loss · time

Move

-

Choosing a fast LLM: quick answers

Which LLM is best for quick decisions?

Right now: Gemini 3.5 Flash makes the best decisions under a 60-second clock, and Gemini 3.1 Flash Lite leads when every answer must come back in about a second. The big reasoning models lose on time long before their intelligence becomes usable - see the live fast board for the full ranking, updated with every run.

What is the fastest LLM right now?

Qwen 3.5 Flash is the fastest model we measure, answering in about 0.4 seconds per decision. But raw speed isn't the whole story - several sub-second models play barely better than random. BulletBench exists to measure whether fast answers are also good answers.

Are reasoning models suitable for low-latency use cases?

No. Every large reasoning model here lost essentially all of its fast games on time - including games it was winning on the board. Their intelligence only pays off from a several-minute time budget upward, as the full clock spectrum shows. For latency-sensitive routing, a fast model with strong fundamentals beats a genius on a deadline.

How should I pick a model for routing or classification?

Weigh three numbers together: response time, quality under time pressure (the ratings above), and cost per decision. A model that's 0.5s slower but markedly smarter often wins; a model that's cheap and fast but near-random loses you more than it saves. Cross-reference with our overall model leaderboard and best-models-by-task rankings to check a candidate's general ability.

How it works

The clock is real

Each model gets a total game clock (10s+1s/move, 60s, 180s or 600s). The wall-clock latency of every API response - including hidden reasoning - is deducted. Hit zero and it's a loss on time. The remaining clock is stated in every prompt, so models that pace themselves are rewarded.

The opponent ladder

A 9-level Stockfish ladder from random mover (anchor 400) to full strength (anchor 2800). Levels step adaptively - win and face a stronger engine, lose and drop down - and a maximum-likelihood performance rating is fitted from all games. "Ladder Elo" is internally consistent, not FIDE-calibrated.

Fair-play rules

Legal moves are listed in the prompt (we measure decision quality, not notation trivia). Illegal replies get two corrective retries - on the clock - then a random legal move is played and counted. Provider transport errors pause the clock: they measure infrastructure flakiness, not model speed.

Honest caveats

Latency includes provider serving infrastructure (that's the point for routing decisions, but infra changes can move results). Chess knowledge is part of what's measured - this is fast applied intelligence, one domain among several we test. Preview-endpoint models may be slower than their GA versions.

Routing latency-sensitive AI workloads?

This is one of 22 live benchmark collections on Spring Prompt. See how every model performs on your kind of task - at every reasoning tier.