BulletBench
How smart is an AI per second? We make AI models play speed chess against a chess computer, on a real clock: every second a model spends producing its move drains its time, and when the clock hits zero it loses - even from a winning position. Fast and good wins. Slow genius flags.
The field - fastest thinkers first
Bullet (60s) standings
2166 gamesChess rating at the 60-second bullet format. Full board below.
The fastest LLMs, ranked: who can actually think fast?
This is the live leaderboard for low-latency AI: the two fastest formats mirror the jobs where a model must be smart right now (routing requests, classifying, quick decisions in real-time agents): Bullet gives a model 60 seconds of thinking for an entire game. Lightning gives it 10 seconds plus 1 extra second per move - it can play forever, but only if it answers in about a second. Ratings work like human chess ratings: higher is stronger (a club player is ~1500, a random-move player ~400). Click any column header to sort.
| Model | Bullet rating 60s whole game | Lightning rating 10s + 1s per move | Response time median secs per move | Output speed effective tokens/sec | Time losses % games lost on time | Cost avg $ per game |
|---|---|---|---|---|---|---|
|
Qwen 3.5 Flash Alibaba via OpenRouter · off reasoning |
589
0.5s/move
$0.00/game
|
558
0.4s/move
$0.00/game
|
0.4s | 5 t/s | 0% 0/64 | $0.00 |
|
Mercury 2 (Inception) Inception via OpenRouter · off reasoning |
159
0.5s/move
$0.01/game
⏱ 18/32
|
342
0.5s/move
$0.01/game
⏱ 3/32
|
0.5s | 53 t/s | 33% 21/64 | $0.01 |
|
Ministral 3 3B Mistral via OpenRouter · off reasoning |
0
0.7s/move
$0.00/game
⏱ 26/32
|
0
0.8s/move
$0.01/game
⏱ 17/30
|
0.7s | 4 t/s | 69% 43/62 | $0.01 |
|
Ministral 14B Mistral via OpenRouter · off reasoning |
220
0.8s/move
$0.01/game
⏱ 16/31
|
444
0.8s/move
$0.01/game
⏱ 5/32
|
0.8s | 4 t/s | 33% 21/63 | $0.01 |
|
Mistral Small 4 Mistral via OpenRouter · off reasoning |
348
0.8s/move
$0.00/game
⏱ 11/32
|
555
0.8s/move
$0.00/game
⏱ 1/30
|
0.8s | 4 t/s | 19% 12/62 | $0.00 |
|
Gemini 3.1 Flash Lite Google · off reasoning |
863
0.9s/move
$0.01/game
⏱ 8/32
|
876
0.8s/move
$0.01/game
|
0.9s | 3 t/s | 12% 8/64 | $0.01 |
|
Nova Micro Amazon via OpenRouter · off reasoning |
0
0.8s/move
$0.00/game
⏱ 29/32
|
348
0.9s/move
$0.00/game
⏱ 1/30
|
0.8s | 4 t/s | 48% 30/62 | $0.00 |
|
Nemotron 3 Nano 30B-A3B NVIDIA via OpenRouter · off reasoning |
0
0.9s/move
$0.00/game
⏱ 26/32
|
356
0.9s/move
$0.00/game
⏱ 8/30
|
0.9s | 6 t/s | 55% 34/62 | $0.00 |
|
LFM-2 24B-A2B (Liquid) Liquid AI via OpenRouter · off reasoning |
0
0.8s/move
$0.00/game
⏱ 27/32
|
223
0.9s/move
$0.00/game
⏱ 11/30
|
0.9s | 6 t/s | 61% 38/62 | $0.00 |
|
Gemma 4 26B (A4B) Google via OpenRouter · off reasoning |
426
1.0s/move
$0.00/game
⏱ 12/32
|
193
0.8s/move
$0.00/game
⏱ 23/32
|
0.9s | 3 t/s | 55% 35/64 | $0.00 |
|
Nova 2.0 Lite Amazon via OpenRouter · off reasoning |
219
1.0s/move
$0.00/game
⏱ 18/32
|
16
1.1s/move
$0.00/game
⏱ 23/29
|
1.0s | 4 t/s | 67% 41/61 | $0.00 |
|
Claude Haiku 4.5 Anthropic · off reasoning |
219
1.2s/move
$0.02/game
⏱ 21/32
|
153
1.2s/move
$0.01/game
⏱ 25/32
|
1.2s | 9 t/s | 72% 46/64 | $0.01 |
|
Gemini 3.5 Flash Google · off reasoning |
900
1.3s/move
$0.03/game
⏱ 9/32
|
536
1.3s/move
$0.01/game
⏱ 18/32
|
1.3s | 2 t/s | 42% 27/64 | $0.02 |
|
Qwen 3.5 9B Alibaba via OpenRouter · off reasoning |
145
1.5s/move
$0.00/game
⏱ 24/32
|
≤0
1.4s/move
$0.00/game
⏱ 32/32
|
1.5s | 2 t/s | 88% 56/64 | $0.00 |
|
Gemini 3 Flash Google · off reasoning |
800
1.4s/move
$0.01/game
⏱ 13/32
|
318
1.4s/move
$0.00/game
⏱ 23/32
|
1.4s | 2 t/s | 56% 36/64 | $0.01 |
|
Claude Opus 4.8 Anthropic · low reasoning heavy |
369
1.7s/move
$0.09/game
⏱ 16/32
|
≤0
1.7s/move
$0.03/game
⏱ 16/16
|
1.7s | 3 t/s | 67% 32/48 | $0.06 |
|
GPT-5.4 mini OpenAI · off reasoning |
100
2.7s/move
$0.02/game
⏱ 14/16
|
≤0
1.7s/move
$0.01/game
⏱ 16/16
|
2.2s | 67 t/s | 94% 30/32 | $0.02 |
|
GPT-5.4 nano OpenAI · off reasoning |
0
5.2s/move
$0.01/game
⏱ 15/16
|
≤0
2.2s/move
$0.00/game
⏱ 16/16
|
3.7s | 92 t/s | 97% 31/32 | $0.01 |
|
Gemini 3.1 Pro Google · low reasoning heavy |
114
5.1s/move
$0.03/game
⏱ 14/16
|
≤0
3.6s/move
$0.01/game
⏱ 16/16
|
4.4s | 31 t/s | 94% 30/32 | $0.02 |
|
Claude Fable 5 Anthropic · low reasoning heavy |
≤0
4.7s/move
$0.11/game
⏱ 16/16
|
≤0
4.5s/move
$0.02/game
⏱ 16/16
|
4.6s | 8 t/s | 100% 32/32 | $0.06 |
|
GPT-5.5 OpenAI · medium reasoning heavy |
114
4.6s/move
$0.09/game
⏱ 14/16
|
0
2.3s/move
$0.02/game
⏱ 15/16
|
3.4s | 34 t/s | 91% 29/32 | $0.05 |
|
Qwen 3.7 Max Alibaba via OpenRouter · default reasoning heavy |
0
7.4s/move
$0.01/game
⏱ 15/16
|
≤0
3.6s/move
$0.00/game
⏱ 16/16
|
5.5s | 42 t/s | 97% 31/32 | $0.01 |
|
Kimi K2.5 Moonshot via OpenRouter · default reasoning heavy |
≤0
18.6s/move
$0.01/game
⏱ 16/16
|
≤0
5.9s/move
$0.00/game
⏱ 16/16
|
12.2s | 48 t/s | 100% 32/32 | $0.00 |
Elo cells: greener = stronger play, redder = clock death. Flag rate counts games lost on time across both fast controls.
Gemini 3.1 Flash Lite is the only class of model that can sustain ~1-second chess: ladder Elo 876 at the 10+1 lightning control, 0.8s per move.
All 6 frontier heavyweight configurations lost essentially every lightning game on time - including positions they were winning on the board.
Gemini 3.1 Pro gains 481 ladder Elo when the clock moves from blitz (3 min) to rapid (10 min) - intelligence that only exists when you can afford to wait for it.
Qwen 3.7 Max gains 408 ladder Elo when the clock moves from blitz (3 min) to rapid (10 min) - intelligence that only exists when you can afford to wait for it.
The full clock spectrum
Context for the fast board: the same models with room to think - Blitz gives 3 minutes per game, Rapid gives 10. This is where the big reasoning models climb the table, and where you can see exactly what each extra second of time budget buys. Click any column to sort.
| Model | Lightning 10+1s | Bullet 60s | Blitz 180s | Rapid 600s |
|---|---|---|---|---|
|
Qwen 3.5 Flash 0.4s/move · off reasoning |
558
0.4s/move
$0.00/game
|
589
0.5s/move
$0.00/game
|
558
0.4s/move
$0.00/game
|
529
0.4s/move
$0.00/game
|
|
Mercury 2 (Inception) 0.5s/move · off reasoning |
342
0.5s/move
$0.01/game
⏱ 3/32
|
159
0.5s/move
$0.01/game
⏱ 18/32
|
393
0.5s/move
$0.02/game
|
445
0.5s/move
$0.01/game
|
|
Ministral 3 3B 0.8s/move · off reasoning |
0
0.8s/move
$0.01/game
⏱ 17/30
|
0
0.7s/move
$0.00/game
⏱ 26/32
|
23
0.8s/move
$0.01/game
⏱ 1/32
|
16
0.9s/move
$0.01/game
|
|
Ministral 14B 0.8s/move · off reasoning |
444
0.8s/move
$0.01/game
⏱ 5/32
|
220
0.8s/move
$0.01/game
⏱ 16/31
|
499
0.7s/move
$0.01/game
|
529
0.9s/move
$0.01/game
|
|
Mistral Small 4 0.8s/move · off reasoning |
555
0.8s/move
$0.00/game
⏱ 1/30
|
348
0.8s/move
$0.00/game
⏱ 11/32
|
543
0.8s/move
$0.01/game
|
488
0.8s/move
$0.01/game
|
|
Gemini 3.1 Flash Lite 0.9s/move · off reasoning |
876
0.8s/move
$0.01/game
|
863
0.9s/move
$0.01/game
⏱ 8/32
|
827
0.9s/move
$0.01/game
|
762
0.8s/move
$0.01/game
|
|
Nova Micro 0.9s/move · off reasoning |
348
0.9s/move
$0.00/game
⏱ 1/30
|
0
0.8s/move
$0.00/game
⏱ 29/32
|
373
0.8s/move
$0.00/game
|
529
0.9s/move
$0.00/game
|
|
Nemotron 3 Nano 30B-A3B 0.9s/move · off reasoning |
356
0.9s/move
$0.00/game
⏱ 8/30
|
0
0.9s/move
$0.00/game
⏱ 26/32
|
372
0.9s/move
$0.00/game
⏱ 2/32
|
572
0.9s/move
$0.00/game
|
|
LFM-2 24B-A2B (Liquid) 0.9s/move · off reasoning |
223
0.9s/move
$0.00/game
⏱ 11/30
|
0
0.8s/move
$0.00/game
⏱ 27/32
|
293
0.9s/move
$0.00/game
⏱ 1/32
|
377
0.9s/move
$0.00/game
|
|
Gemma 4 26B (A4B) 0.9s/move · off reasoning |
193
0.8s/move
$0.00/game
⏱ 23/32
|
426
1.0s/move
$0.00/game
⏱ 12/32
|
529
1.0s/move
$0.00/game
⏱ 2/32
|
571
0.9s/move
$0.00/game
|
|
Nova 2.0 Lite 1.0s/move · off reasoning |
16
1.1s/move
$0.00/game
⏱ 23/29
|
219
1.0s/move
$0.00/game
⏱ 18/32
|
486
1.0s/move
$0.00/game
|
453
1.0s/move
$0.00/game
|
|
Claude Haiku 4.5 1.2s/move · off reasoning |
153
1.2s/move
$0.01/game
⏱ 25/32
|
219
1.2s/move
$0.02/game
⏱ 21/32
|
544
1.2s/move
$0.02/game
⏱ 1/32
|
488
1.1s/move
$0.03/game
|
|
Gemini 3.5 Flash 1.3s/move · off reasoning |
536
1.3s/move
$0.01/game
⏱ 18/32
|
900
1.3s/move
$0.03/game
⏱ 9/32
|
915
1.3s/move
$0.04/game
|
1086
1.3s/move
$0.04/game
|
|
Qwen 3.5 9B 1.4s/move · off reasoning |
≤0
1.4s/move
$0.00/game
⏱ 32/32
|
145
1.5s/move
$0.00/game
⏱ 24/32
|
499
1.4s/move
$0.00/game
⏱ 3/32
|
529
1.2s/move
$0.00/game
|
|
Gemini 3 Flash 1.4s/move · off reasoning |
318
1.4s/move
$0.00/game
⏱ 23/32
|
800
1.4s/move
$0.01/game
⏱ 13/32
|
949
1.4s/move
$0.01/game
|
745
1.4s/move
$0.01/game
|
|
Claude Opus 4.8 1.7s/move · low reasoning heavy |
≤0
1.7s/move
$0.03/game
⏱ 16/16
|
369
1.7s/move
$0.09/game
⏱ 16/32
|
559
1.7s/move
$0.13/game
⏱ 3/32
|
571
1.7s/move
$0.13/game
|
|
GPT-5.4 mini 3.2s/move · off reasoning |
≤0
1.7s/move
$0.01/game
⏱ 16/16
|
100
2.7s/move
$0.02/game
⏱ 14/16
|
352
3.9s/move
$0.06/game
⏱ 12/24
|
571
4.7s/move
$0.10/game
|
|
GPT-5.4 nano 5.4s/move · off reasoning |
≤0
2.2s/move
$0.00/game
⏱ 16/16
|
0
5.2s/move
$0.01/game
⏱ 15/16
|
225
7.1s/move
$0.02/game
⏱ 18/24
|
572
7.0s/move
$0.03/game
|
|
Gemini 3.1 Pro 5.8s/move · low reasoning heavy |
≤0
3.6s/move
$0.01/game
⏱ 16/16
|
114
5.1s/move
$0.03/game
⏱ 14/16
|
683
6.8s/move
$0.11/game
⏱ 13/24
|
1164
7.7s/move
$0.38/game
⏱ 2/12
|
|
Claude Fable 5 6.2s/move · low reasoning heavy |
≤0
4.5s/move
$0.02/game
⏱ 16/16
|
≤0
4.7s/move
$0.11/game
⏱ 16/16
|
505
6.1s/move
$0.36/game
⏱ 13/24
|
572
9.4s/move
$1.07/game
⏱ 5/12
|
|
GPT-5.5 7.2s/move · medium reasoning heavy |
0
2.3s/move
$0.02/game
⏱ 15/16
|
114
4.6s/move
$0.09/game
⏱ 14/16
|
174
9.6s/move
$0.24/game
⏱ 20/24
|
409
12.3s/move
$0.69/game
⏱ 8/12
|
|
Qwen 3.7 Max 9.6s/move · default reasoning heavy |
≤0
3.6s/move
$0.00/game
⏱ 16/16
|
0
7.4s/move
$0.01/game
⏱ 15/16
|
164
12.0s/move
$0.04/game
⏱ 20/24
|
572
15.3s/move
$0.10/game
⏱ 4/12
|
|
Kimi K2.5 26.2s/move · default reasoning heavy |
≤0
5.9s/move
$0.00/game
⏱ 16/16
|
≤0
18.6s/move
$0.01/game
⏱ 16/16
|
≤0
23.9s/move
$0.03/game
⏱ 16/16
|
33
56.6s/move
$0.07/game
⏱ 11/12
|
Ladder Elo: anchored to a Stockfish skill ladder (random mover = 400). Internally consistent ordering, not FIDE-calibrated. Sub-400 values at fast controls mean the model died on the clock, not that it plays worse than random chess. ⏱ n/m = games lost on time.
Watch the games
Full games with per-move API latency. Step through and watch the clock do its work.
gemini-3-flash (black) vs Stockfish L3 (~1200)
60s clock · 54 plies · 41.6s thinking · $0.01
Move
-
gemini-3.1-flash-lite (white) vs Stockfish L2 (~1000)
60s clock · 35 plies · 16.1s thinking · $0.00
Move
-
gemini-3.1-pro (white) vs Stockfish L2 (~1000)
180s clock · 37 plies · 159.0s thinking · $0.12
Move
-
lfm-2-24b (black) vs Stockfish L0 (~400)
180s clock · 199 plies · 180.4s thinking · $0.00
Move
-
Choosing a fast LLM: quick answers
Which LLM is best for quick decisions?
Right now: Gemini 3.5 Flash makes the best decisions under a 60-second clock, and Gemini 3.1 Flash Lite leads when every answer must come back in about a second. The big reasoning models lose on time long before their intelligence becomes usable - see the live fast board for the full ranking, updated with every run.
What is the fastest LLM right now?
Qwen 3.5 Flash is the fastest model we measure, answering in about 0.4 seconds per decision. But raw speed isn't the whole story - several sub-second models play barely better than random. BulletBench exists to measure whether fast answers are also good answers.
Are reasoning models suitable for low-latency use cases?
No. Every large reasoning model here lost essentially all of its fast games on time - including games it was winning on the board. Their intelligence only pays off from a several-minute time budget upward, as the full clock spectrum shows. For latency-sensitive routing, a fast model with strong fundamentals beats a genius on a deadline.
How should I pick a model for routing or classification?
Weigh three numbers together: response time, quality under time pressure (the ratings above), and cost per decision. A model that's 0.5s slower but markedly smarter often wins; a model that's cheap and fast but near-random loses you more than it saves. Cross-reference with our overall model leaderboard and best-models-by-task rankings to check a candidate's general ability.
How it works
The clock is real
Each model gets a total game clock (10s+1s/move, 60s, 180s or 600s). The wall-clock latency of every API response - including hidden reasoning - is deducted. Hit zero and it's a loss on time. The remaining clock is stated in every prompt, so models that pace themselves are rewarded.
The opponent ladder
A 9-level Stockfish ladder from random mover (anchor 400) to full strength (anchor 2800). Levels step adaptively - win and face a stronger engine, lose and drop down - and a maximum-likelihood performance rating is fitted from all games. "Ladder Elo" is internally consistent, not FIDE-calibrated.
Fair-play rules
Legal moves are listed in the prompt (we measure decision quality, not notation trivia). Illegal replies get two corrective retries - on the clock - then a random legal move is played and counted. Provider transport errors pause the clock: they measure infrastructure flakiness, not model speed.
Honest caveats
Latency includes provider serving infrastructure (that's the point for routing decisions, but infra changes can move results). Chess knowledge is part of what's measured - this is fast applied intelligence, one domain among several we test. Preview-endpoint models may be slower than their GA versions.
Routing latency-sensitive AI workloads?
This is one of 22 live benchmark collections on Spring Prompt. See how every model performs on your kind of task - at every reasoning tier.