NonoBench Results

Benchmark results for LLM performance on Nonogram puzzle solving. Comparing accuracy, speed, and cost across different grid sizes.

Puzzles:30
Models:45
Total runs:1349
Explore puzzles

Last updated: 2/19/2026, 5:23:00 PM

Filter by model type:
Model Accuracy
Size:

Detailed Model Statistics

Model
Overall
5×510×1015×15
AccuracyRunsCorrectAvg costAvg timeAccuracyRunsCorrectAvg costAvg timeAccuracyRunsCorrectAvg costAvg time
gemini-3.1-pro-preview-high
63.3%90%109$0.0431.88s60%106$0.26163.72s40%104$0.25174.99s
claude-4.5-opus-high
56.7%100%1010$0.1388.03s40%104$1.19844.26s30%103$0.98778.95s
gemini-3-pro-preview-high
53.3%90%109$0.0444.69s60%106$0.22180.79s10%101$0.26254.67s
gpt-5.2-high
53.3%90%109$0.0445.42s50%105$0.40531.64s20%102$0.64933.66s
claude-4.5-opus-low
50.0%100%1010$0.0951.44s40%104$0.37233.62s10%101$0.19115.41s
gpt-5.2-xhigh
50.0%90%109$0.0890.89s60%106$0.61749.07s0%100$0.56994.77s
deepseek-v3.2-speciale-high
46.7%90%109$0.0055597.11s30%103$0.022580.79s20%102$0.021511.17s
gemini-3-flash-preview-high
46.7%90%109$0.0248.55s40%104$0.07164.13s10%101$0.08139.85s
gpt-5.2-low
46.7%100%1010$0.0334.78s20%102$0.21278.82s20%102$0.16184.03s
gemini-3.1-pro-preview-low
41.4%80%108$0.0239.45s22%92$0.0465.49s20%102$0.0345.48s
deepseek-v3.2-high
36.7%90%109$0.0040529.73s10%101$0.03882.49s10%101$0.03704.64s
deepseek-v3.2-speciale
36.7%80%108$0.0044542.25s20%102$0.021795.09s10%101$0.012008.11s
gpt-oss-120b-high
36.7%80%108$0.001886.90s30%103$0.01237.03s0%100$0.0091113.93s
kimi-k2.5-high
36.7%100%1010$0.03231.56s10%101$0.141012.85s0%100$0.09708.31s
grok-4
33.3%80%108$0.13146.35s20%102$0.70780.83s0%100$0.68809.63s
minimax-m2.5-high
33.3%90%109$0.02220.06s10%101$0.081473.31s0%100$0.05845.37s
gpt-oss-120b-low
30.0%90%109$0.000235.14s0%100$0.001846.15s0%100$0.000924.41s
qwen3-next-80b-a3b-thinking
30.0%90%109$0.0179.71s0%100$0.04164.82s0%100$0.04199.00s
seed-1.6-high
30.0%90%109$0.01117.03s0%100$0.04359.01s0%100$0.03293.53s
glm-5-reasoning-high
26.7%60%106$0.0191.85s10%101$0.07650.51s10%101$0.07736.69s
kimi-k2-thinking
26.7%80%108$0.04260.15s0%100$0.12938.03s0%100$0.14428.72s
minimax-m2.1-high
26.7%80%108$0.0085129.04s0%100$0.03279.81s0%100$0.02400.14s
claude-4.5-sonnet-reasoning
23.3%70%107$0.0991.83s0%100$0.35365.92s0%100$0.15160.40s
grok-4.1-fast-reasoning
23.3%60%106$0.003347.29s10%101$0.02295.44s0%100$0.03390.37s
minimax-m2.5
23.3%70%107$0.01181.17s0%100$0.091481.82s0%100$0.06732.98s
glm-4.7-reasoning
20.0%60%106$0.02250.43s0%100$0.06653.38s0%100$0.04423.89s
glm-5-reasoning
20.0%60%106$0.02209.77s0%100$0.10770.82s0%100$0.08736.45s
minimax-m2.1
20.0%60%106$0.0093150.83s0%100$0.03313.90s0%100$0.02367.21s
mimo-v2-flash-high
16.7%50%105$0.0000155.63s0%100$0.0000365.93s0%100$0.0000572.42s
glm-4.7-reasoning-high
13.3%40%104$0.01220.38s0%100$0.06652.39s0%100$0.04642.59s
grok-4.1-fast-reasoning-high
13.3%40%104$0.003143.43s0%100$0.02349.43s0%100$0.02262.72s
seed-1.6-flash-high
13.3%40%104$0.004289.14s0%100$0.0076172.38s0%100$0.0083183.87s
olmo-3.1-32b-think
10.0%30%103$0.0071153.69s0%100$0.01290.81s0%100$0.01301.58s
deepseek-v3.2
3.3%10%101$0.000428.16s0%100$0.0028157.06s0%100$0.000840.96s
kimi-k2
3.3%10%101$0.0051105.92s0%100$0.01101.82s0%100$0.0073110.70s
claude-4.5-sonnet-non-reasoning
0.0%0%100$0.0116.12s0%100$0.009510.42s0%100$0.009310.60s
gemini-3-flash-preview-minimal
0.0%0%100$0.00021.10s0%100$0.00051.36s0%100$0.00091.93s
gemini-3-pro-preview-low
0.0%0%100$0.00263.84s0%100$0.00404.49s0%100$0.00555.32s
glm-4.7-non-reasoning
0.0%0%100$0.00025.73s0%100$0.00039.51s0%100$0.000416.67s
glm-5-non-reasoning
0.0%0%100$0.00037.96s0%100$0.000411.57s0%100$0.0005250.52s
grok-4.1-fast-non-reasoning
0.0%0%100$0.00012.16s0%100$0.006699.06s0%100$0.000911.48s
kimi-k2.5-non-reasoning
0.0%0%100$0.00021.63s0%100$0.00052.42s0%100$0.00064.06s
mimo-v2-flash
0.0%0%100$0.000010.98s0%100$0.000012.71s0%100$0.000091.02s
ministral-14b-2512
0.0%0%100$0.0001544ms0%100$0.00012.30s0%100$0.000839.92s
mistral-large-2512
0.0%0%100$0.00021.13s0%100$0.00042.36s0%100$0.00107.24s

Statistics by grid size

5x5
Grid

Avg accuracy

56.2%

Solved

253/450

Avg time

118.24s

Avg cost

$0.02

10x10
Grid

Avg accuracy

12.0%

Solved

54/449

Avg time

457.31s

Avg cost

$0.12

15x15
Grid

Avg accuracy

4.7%

Solved

21/450

Avg time

394.90s

Avg cost

$0.11