How do you measure AI model speed?

Real wall-clock API latency, the number an application actually experiences: models play chess on a real game clock and every second spent producing a move - including hidden reasoning - is deducted. Run out of time and the game is lost, so speed and quality are measured in a single score.

Spring Prompt · Live AI Benchmark

BulletBench

Name: BulletBench - the fastest-LLM benchmark (AI speed chess leaderboard)
Creator: Spring Prompt

How smart is an AI per second? We make AI models play speed chess against a chess computer, on a real clock: every second a model spends producing its move drains its time, and when the clock hits zero it loses - even from a winning position. Fast and good wins. Slow genius flags.

The field - fastest thinkers first

Gemini 3.5 Flash

Gemini 3.1 Flash Lite

Gemini 3 Flash

Qwen 3.5 Flash

Gemma 4 26B (A4B)

Claude Opus 4.8

Mistral Small 4

Ministral 14B

and 15 more →

Bullet (60s) standings

2166 games

🥇

Gemini 3.5 Flash 1.3s/mv 900

🥈

Gemini 3.1 Flash Lite 0.9s/mv 863

🥉

Gemini 3 Flash 1.4s/mv 800

Qwen 3.5 Flash 0.5s/mv 589

Gemma 4 26B (A4B) 1.0s/mv 426

Claude Opus 4.8 1.7s/mv 369

Chess rating at the 60-second bullet format. Full board below.

The fastest LLMs, ranked: who can actually think fast?

This is the live leaderboard for low-latency AI: the two fastest formats mirror the jobs where a model must be smart right now (routing requests, classifying, quick decisions in real-time agents): Bullet gives a model 60 seconds of thinking for an entire game. Lightning gives it 10 seconds plus 1 extra second per move - it can play forever, but only if it answers in about a second. Ratings work like human chess ratings: higher is stronger (a club player is ~1500, a random-move player ~400). Click any column header to sort.

Model	Bullet rating 60s whole game	Lightning rating 10s + 1s per move	Response time median secs per move	Output speed effective tokens/sec	Time losses % games lost on time	Cost avg $ per game
Qwen 3.5 Flash Alibaba via OpenRouter · off reasoning	589 0.5s/move $0.00/game	558 0.4s/move $0.00/game	0.4s	5 t/s	0% 0/64	$0.00
Mercury 2 (Inception) Inception via OpenRouter · off reasoning	159 0.5s/move $0.01/game ⏱ 18/32	342 0.5s/move $0.01/game ⏱ 3/32	0.5s	53 t/s	33% 21/64	$0.01
Ministral 3 3B Mistral via OpenRouter · off reasoning	0 0.7s/move $0.00/game ⏱ 26/32	0 0.8s/move $0.01/game ⏱ 17/30	0.7s	4 t/s	69% 43/62	$0.01
Ministral 14B Mistral via OpenRouter · off reasoning	220 0.8s/move $0.01/game ⏱ 16/31	444 0.8s/move $0.01/game ⏱ 5/32	0.8s	4 t/s	33% 21/63	$0.01
Mistral Small 4 Mistral via OpenRouter · off reasoning	348 0.8s/move $0.00/game ⏱ 11/32	555 0.8s/move $0.00/game ⏱ 1/30	0.8s	4 t/s	19% 12/62	$0.00
Gemini 3.1 Flash Lite Google · off reasoning	863 0.9s/move $0.01/game ⏱ 8/32	876 0.8s/move $0.01/game	0.9s	3 t/s	12% 8/64	$0.01
Nova Micro Amazon via OpenRouter · off reasoning	0 0.8s/move $0.00/game ⏱ 29/32	348 0.9s/move $0.00/game ⏱ 1/30	0.8s	4 t/s	48% 30/62	$0.00
Nemotron 3 Nano 30B-A3B NVIDIA via OpenRouter · off reasoning	0 0.9s/move $0.00/game ⏱ 26/32	356 0.9s/move $0.00/game ⏱ 8/30	0.9s	6 t/s	55% 34/62	$0.00
LFM-2 24B-A2B (Liquid) Liquid AI via OpenRouter · off reasoning	0 0.8s/move $0.00/game ⏱ 27/32	223 0.9s/move $0.00/game ⏱ 11/30	0.9s	6 t/s	61% 38/62	$0.00
Gemma 4 26B (A4B) Google via OpenRouter · off reasoning	426 1.0s/move $0.00/game ⏱ 12/32	193 0.8s/move $0.00/game ⏱ 23/32	0.9s	3 t/s	55% 35/64	$0.00
Nova 2.0 Lite Amazon via OpenRouter · off reasoning	219 1.0s/move $0.00/game ⏱ 18/32	16 1.1s/move $0.00/game ⏱ 23/29	1.0s	4 t/s	67% 41/61	$0.00
Claude Haiku 4.5 Anthropic · off reasoning	219 1.2s/move $0.02/game ⏱ 21/32	153 1.2s/move $0.01/game ⏱ 25/32	1.2s	9 t/s	72% 46/64	$0.01
Gemini 3.5 Flash Google · off reasoning	900 1.3s/move $0.03/game ⏱ 9/32	536 1.3s/move $0.01/game ⏱ 18/32	1.3s	2 t/s	42% 27/64	$0.02
Qwen 3.5 9B Alibaba via OpenRouter · off reasoning	145 1.5s/move $0.00/game ⏱ 24/32	≤0 1.4s/move $0.00/game ⏱ 32/32	1.5s	2 t/s	88% 56/64	$0.00
Gemini 3 Flash Google · off reasoning	800 1.4s/move $0.01/game ⏱ 13/32	318 1.4s/move $0.00/game ⏱ 23/32	1.4s	2 t/s	56% 36/64	$0.01
Claude Opus 4.8 Anthropic · low reasoning heavy	369 1.7s/move $0.09/game ⏱ 16/32	≤0 1.7s/move $0.03/game ⏱ 16/16	1.7s	3 t/s	67% 32/48	$0.06
GPT-5.4 mini OpenAI · off reasoning	100 2.7s/move $0.02/game ⏱ 14/16	≤0 1.7s/move $0.01/game ⏱ 16/16	2.2s	67 t/s	94% 30/32	$0.02
GPT-5.4 nano OpenAI · off reasoning	0 5.2s/move $0.01/game ⏱ 15/16	≤0 2.2s/move $0.00/game ⏱ 16/16	3.7s	92 t/s	97% 31/32	$0.01
Gemini 3.1 Pro Google · low reasoning heavy	114 5.1s/move $0.03/game ⏱ 14/16	≤0 3.6s/move $0.01/game ⏱ 16/16	4.4s	31 t/s	94% 30/32	$0.02
Claude Fable 5 Anthropic · low reasoning heavy	≤0 4.7s/move $0.11/game ⏱ 16/16	≤0 4.5s/move $0.02/game ⏱ 16/16	4.6s	8 t/s	100% 32/32	$0.06
GPT-5.5 OpenAI · medium reasoning heavy	114 4.6s/move $0.09/game ⏱ 14/16	0 2.3s/move $0.02/game ⏱ 15/16	3.4s	34 t/s	91% 29/32	$0.05
Qwen 3.7 Max Alibaba via OpenRouter · default reasoning heavy	0 7.4s/move $0.01/game ⏱ 15/16	≤0 3.6s/move $0.00/game ⏱ 16/16	5.5s	42 t/s	97% 31/32	$0.01
Kimi K2.5 Moonshot via OpenRouter · default reasoning heavy	≤0 18.6s/move $0.01/game ⏱ 16/16	≤0 5.9s/move $0.00/game ⏱ 16/16	12.2s	48 t/s	100% 32/32	$0.00

springprompt.com/evals/bullet-chess

Elo cells: greener = stronger play, redder = clock death. Flag rate counts games lost on time across both fast controls.

Gemini 3.1 Flash Lite is the only class of model that can sustain ~1-second chess: ladder Elo 876 at the 10+1 lightning control, 0.8s per move.

All 6 frontier heavyweight configurations lost essentially every lightning game on time - including positions they were winning on the board.

Gemini 3.1 Pro gains 481 ladder Elo when the clock moves from blitz (3 min) to rapid (10 min) - intelligence that only exists when you can afford to wait for it.

Qwen 3.7 Max gains 408 ladder Elo when the clock moves from blitz (3 min) to rapid (10 min) - intelligence that only exists when you can afford to wait for it.

The full clock spectrum

Context for the fast board: the same models with room to think - Blitz gives 3 minutes per game, Rapid gives 10. This is where the big reasoning models climb the table, and where you can see exactly what each extra second of time budget buys. Click any column to sort.

Model	Lightning 10+1s	Bullet 60s	Blitz 180s	Rapid 600s
Qwen 3.5 Flash 0.4s/move · off reasoning	558 0.4s/move $0.00/game	589 0.5s/move $0.00/game	558 0.4s/move $0.00/game	529 0.4s/move $0.00/game
Mercury 2 (Inception) 0.5s/move · off reasoning	342 0.5s/move $0.01/game ⏱ 3/32	159 0.5s/move $0.01/game ⏱ 18/32	393 0.5s/move $0.02/game	445 0.5s/move $0.01/game
Ministral 3 3B 0.8s/move · off reasoning	0 0.8s/move $0.01/game ⏱ 17/30	0 0.7s/move $0.00/game ⏱ 26/32	23 0.8s/move $0.01/game ⏱ 1/32	16 0.9s/move $0.01/game
Ministral 14B 0.8s/move · off reasoning	444 0.8s/move $0.01/game ⏱ 5/32	220 0.8s/move $0.01/game ⏱ 16/31	499 0.7s/move $0.01/game	529 0.9s/move $0.01/game
Mistral Small 4 0.8s/move · off reasoning	555 0.8s/move $0.00/game ⏱ 1/30	348 0.8s/move $0.00/game ⏱ 11/32	543 0.8s/move $0.01/game	488 0.8s/move $0.01/game
Gemini 3.1 Flash Lite 0.9s/move · off reasoning	876 0.8s/move $0.01/game	863 0.9s/move $0.01/game ⏱ 8/32	827 0.9s/move $0.01/game	762 0.8s/move $0.01/game
Nova Micro 0.9s/move · off reasoning	348 0.9s/move $0.00/game ⏱ 1/30	0 0.8s/move $0.00/game ⏱ 29/32	373 0.8s/move $0.00/game	529 0.9s/move $0.00/game
Nemotron 3 Nano 30B-A3B 0.9s/move · off reasoning	356 0.9s/move $0.00/game ⏱ 8/30	0 0.9s/move $0.00/game ⏱ 26/32	372 0.9s/move $0.00/game ⏱ 2/32	572 0.9s/move $0.00/game
LFM-2 24B-A2B (Liquid) 0.9s/move · off reasoning	223 0.9s/move $0.00/game ⏱ 11/30	0 0.8s/move $0.00/game ⏱ 27/32	293 0.9s/move $0.00/game ⏱ 1/32	377 0.9s/move $0.00/game
Gemma 4 26B (A4B) 0.9s/move · off reasoning	193 0.8s/move $0.00/game ⏱ 23/32	426 1.0s/move $0.00/game ⏱ 12/32	529 1.0s/move $0.00/game ⏱ 2/32	571 0.9s/move $0.00/game
Nova 2.0 Lite 1.0s/move · off reasoning	16 1.1s/move $0.00/game ⏱ 23/29	219 1.0s/move $0.00/game ⏱ 18/32	486 1.0s/move $0.00/game	453 1.0s/move $0.00/game
Claude Haiku 4.5 1.2s/move · off reasoning	153 1.2s/move $0.01/game ⏱ 25/32	219 1.2s/move $0.02/game ⏱ 21/32	544 1.2s/move $0.02/game ⏱ 1/32	488 1.1s/move $0.03/game
Gemini 3.5 Flash 1.3s/move · off reasoning	536 1.3s/move $0.01/game ⏱ 18/32	900 1.3s/move $0.03/game ⏱ 9/32	915 1.3s/move $0.04/game	1086 1.3s/move $0.04/game
Qwen 3.5 9B 1.4s/move · off reasoning	≤0 1.4s/move $0.00/game ⏱ 32/32	145 1.5s/move $0.00/game ⏱ 24/32	499 1.4s/move $0.00/game ⏱ 3/32	529 1.2s/move $0.00/game
Gemini 3 Flash 1.4s/move · off reasoning	318 1.4s/move $0.00/game ⏱ 23/32	800 1.4s/move $0.01/game ⏱ 13/32	949 1.4s/move $0.01/game	745 1.4s/move $0.01/game
Claude Opus 4.8 1.7s/move · low reasoning heavy	≤0 1.7s/move $0.03/game ⏱ 16/16	369 1.7s/move $0.09/game ⏱ 16/32	559 1.7s/move $0.13/game ⏱ 3/32	571 1.7s/move $0.13/game
GPT-5.4 mini 3.2s/move · off reasoning	≤0 1.7s/move $0.01/game ⏱ 16/16	100 2.7s/move $0.02/game ⏱ 14/16	352 3.9s/move $0.06/game ⏱ 12/24	571 4.7s/move $0.10/game
GPT-5.4 nano 5.4s/move · off reasoning	≤0 2.2s/move $0.00/game ⏱ 16/16	0 5.2s/move $0.01/game ⏱ 15/16	225 7.1s/move $0.02/game ⏱ 18/24	572 7.0s/move $0.03/game
Gemini 3.1 Pro 5.8s/move · low reasoning heavy	≤0 3.6s/move $0.01/game ⏱ 16/16	114 5.1s/move $0.03/game ⏱ 14/16	683 6.8s/move $0.11/game ⏱ 13/24	1164 7.7s/move $0.38/game ⏱ 2/12
Claude Fable 5 6.2s/move · low reasoning heavy	≤0 4.5s/move $0.02/game ⏱ 16/16	≤0 4.7s/move $0.11/game ⏱ 16/16	505 6.1s/move $0.36/game ⏱ 13/24	572 9.4s/move $1.07/game ⏱ 5/12
GPT-5.5 7.2s/move · medium reasoning heavy	0 2.3s/move $0.02/game ⏱ 15/16	114 4.6s/move $0.09/game ⏱ 14/16	174 9.6s/move $0.24/game ⏱ 20/24	409 12.3s/move $0.69/game ⏱ 8/12
Qwen 3.7 Max 9.6s/move · default reasoning heavy	≤0 3.6s/move $0.00/game ⏱ 16/16	0 7.4s/move $0.01/game ⏱ 15/16	164 12.0s/move $0.04/game ⏱ 20/24	572 15.3s/move $0.10/game ⏱ 4/12
Kimi K2.5 26.2s/move · default reasoning heavy	≤0 5.9s/move $0.00/game ⏱ 16/16	≤0 18.6s/move $0.01/game ⏱ 16/16	≤0 23.9s/move $0.03/game ⏱ 16/16	33 56.6s/move $0.07/game ⏱ 11/12

springprompt.com/evals/bullet-chess

Ladder Elo: anchored to a Stockfish skill ladder (random mover = 400). Internally consistent ordering, not FIDE-calibrated. Sub-400 values at fast controls mean the model died on the clock, not that it plays worse than random chess. ⏱ n/m = games lost on time.

Watch the games

Full games with per-move API latency. Step through and watch the clock do its work.

gemini-3-flash (black) vs Stockfish L3 (~1200)

60s clock · 54 plies · 41.6s thinking · $0.01

win · checkmate

Move

gemini-3.1-flash-lite (white) vs Stockfish L2 (~1000)

60s clock · 35 plies · 16.1s thinking · $0.00

win · checkmate

Move

gemini-3.1-pro (white) vs Stockfish L2 (~1000)

180s clock · 37 plies · 159.0s thinking · $0.12

win · checkmate

Move

lfm-2-24b (black) vs Stockfish L0 (~400)

180s clock · 199 plies · 180.4s thinking · $0.00

loss · time

Move

Choosing a fast LLM: quick answers

Which LLM is best for quick decisions?

Right now: Gemini 3.5 Flash makes the best decisions under a 60-second clock, and Gemini 3.1 Flash Lite leads when every answer must come back in about a second. The big reasoning models lose on time long before their intelligence becomes usable - see the live fast board for the full ranking, updated with every run.

What is the fastest LLM right now?

Qwen 3.5 Flash is the fastest model we measure, answering in about 0.4 seconds per decision. But raw speed isn't the whole story - several sub-second models play barely better than random. BulletBench exists to measure whether fast answers are also good answers.

Are reasoning models suitable for low-latency use cases?

No. Every large reasoning model here lost essentially all of its fast games on time - including games it was winning on the board. Their intelligence only pays off from a several-minute time budget upward, as the full clock spectrum shows. For latency-sensitive routing, a fast model with strong fundamentals beats a genius on a deadline.

How should I pick a model for routing or classification?

Weigh three numbers together: response time, quality under time pressure (the ratings above), and cost per decision. A model that's 0.5s slower but markedly smarter often wins; a model that's cheap and fast but near-random loses you more than it saves. Cross-reference with our overall model leaderboard and best-models-by-task rankings to check a candidate's general ability.

How it works

The clock is real

Each model gets a total game clock (10s+1s/move, 60s, 180s or 600s). The wall-clock latency of every API response - including hidden reasoning - is deducted. Hit zero and it's a loss on time. The remaining clock is stated in every prompt, so models that pace themselves are rewarded.

The opponent ladder

A 9-level Stockfish ladder from random mover (anchor 400) to full strength (anchor 2800). Levels step adaptively - win and face a stronger engine, lose and drop down - and a maximum-likelihood performance rating is fitted from all games. "Ladder Elo" is internally consistent, not FIDE-calibrated.

Fair-play rules

Legal moves are listed in the prompt (we measure decision quality, not notation trivia). Illegal replies get two corrective retries - on the clock - then a random legal move is played and counted. Provider transport errors pause the clock: they measure infrastructure flakiness, not model speed.

Honest caveats

Latency includes provider serving infrastructure (that's the point for routing decisions, but infra changes can move results). Chess knowledge is part of what's measured - this is fast applied intelligence, one domain among several we test. Preview-endpoint models may be slower than their GA versions.

Routing latency-sensitive AI workloads?

This is one of 22 live benchmark collections on Spring Prompt. See how every model performs on your kind of task - at every reasoning tier.

Explore all benchmarks What are reasoning tiers?