Benchmark results for LLM performance on Nonogram puzzle solving. Comparing accuracy, speed, and cost across different grid sizes.
Last updated: 2/19/2026, 5:23:00 PM
| Model | Overall | 5×5 | 10×10 | 15×15 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Runs | Correct | Avg cost | Avg time | Accuracy | Runs | Correct | Avg cost | Avg time | Accuracy | Runs | Correct | Avg cost | Avg time | ||
gemini-3.1-pro-preview-high | 63.3% | 90% | 10 | 9 | $0.04 | 31.88s | 60% | 10 | 6 | $0.26 | 163.72s | 40% | 10 | 4 | $0.25 | 174.99s |
claude-4.5-opus-high | 56.7% | 100% | 10 | 10 | $0.13 | 88.03s | 40% | 10 | 4 | $1.19 | 844.26s | 30% | 10 | 3 | $0.98 | 778.95s |
gemini-3-pro-preview-high | 53.3% | 90% | 10 | 9 | $0.04 | 44.69s | 60% | 10 | 6 | $0.22 | 180.79s | 10% | 10 | 1 | $0.26 | 254.67s |
gpt-5.2-high | 53.3% | 90% | 10 | 9 | $0.04 | 45.42s | 50% | 10 | 5 | $0.40 | 531.64s | 20% | 10 | 2 | $0.64 | 933.66s |
claude-4.5-opus-low | 50.0% | 100% | 10 | 10 | $0.09 | 51.44s | 40% | 10 | 4 | $0.37 | 233.62s | 10% | 10 | 1 | $0.19 | 115.41s |
gpt-5.2-xhigh | 50.0% | 90% | 10 | 9 | $0.08 | 90.89s | 60% | 10 | 6 | $0.61 | 749.07s | 0% | 10 | 0 | $0.56 | 994.77s |
deepseek-v3.2-speciale-high | 46.7% | 90% | 10 | 9 | $0.0055 | 597.11s | 30% | 10 | 3 | $0.02 | 2580.79s | 20% | 10 | 2 | $0.02 | 1511.17s |
gemini-3-flash-preview-high | 46.7% | 90% | 10 | 9 | $0.02 | 48.55s | 40% | 10 | 4 | $0.07 | 164.13s | 10% | 10 | 1 | $0.08 | 139.85s |
gpt-5.2-low | 46.7% | 100% | 10 | 10 | $0.03 | 34.78s | 20% | 10 | 2 | $0.21 | 278.82s | 20% | 10 | 2 | $0.16 | 184.03s |
gemini-3.1-pro-preview-low | 41.4% | 80% | 10 | 8 | $0.02 | 39.45s | 22% | 9 | 2 | $0.04 | 65.49s | 20% | 10 | 2 | $0.03 | 45.48s |
deepseek-v3.2-high | 36.7% | 90% | 10 | 9 | $0.0040 | 529.73s | 10% | 10 | 1 | $0.03 | 882.49s | 10% | 10 | 1 | $0.03 | 704.64s |
deepseek-v3.2-speciale | 36.7% | 80% | 10 | 8 | $0.0044 | 542.25s | 20% | 10 | 2 | $0.02 | 1795.09s | 10% | 10 | 1 | $0.01 | 2008.11s |
gpt-oss-120b-high | 36.7% | 80% | 10 | 8 | $0.0018 | 86.90s | 30% | 10 | 3 | $0.01 | 237.03s | 0% | 10 | 0 | $0.0091 | 113.93s |
kimi-k2.5-high | 36.7% | 100% | 10 | 10 | $0.03 | 231.56s | 10% | 10 | 1 | $0.14 | 1012.85s | 0% | 10 | 0 | $0.09 | 708.31s |
grok-4 | 33.3% | 80% | 10 | 8 | $0.13 | 146.35s | 20% | 10 | 2 | $0.70 | 780.83s | 0% | 10 | 0 | $0.68 | 809.63s |
minimax-m2.5-high | 33.3% | 90% | 10 | 9 | $0.02 | 220.06s | 10% | 10 | 1 | $0.08 | 1473.31s | 0% | 10 | 0 | $0.05 | 845.37s |
gpt-oss-120b-low | 30.0% | 90% | 10 | 9 | $0.0002 | 35.14s | 0% | 10 | 0 | $0.0018 | 46.15s | 0% | 10 | 0 | $0.0009 | 24.41s |
qwen3-next-80b-a3b-thinking | 30.0% | 90% | 10 | 9 | $0.01 | 79.71s | 0% | 10 | 0 | $0.04 | 164.82s | 0% | 10 | 0 | $0.04 | 199.00s |
seed-1.6-high | 30.0% | 90% | 10 | 9 | $0.01 | 117.03s | 0% | 10 | 0 | $0.04 | 359.01s | 0% | 10 | 0 | $0.03 | 293.53s |
glm-5-reasoning-high | 26.7% | 60% | 10 | 6 | $0.01 | 91.85s | 10% | 10 | 1 | $0.07 | 650.51s | 10% | 10 | 1 | $0.07 | 736.69s |
kimi-k2-thinking | 26.7% | 80% | 10 | 8 | $0.04 | 260.15s | 0% | 10 | 0 | $0.12 | 938.03s | 0% | 10 | 0 | $0.14 | 428.72s |
minimax-m2.1-high | 26.7% | 80% | 10 | 8 | $0.0085 | 129.04s | 0% | 10 | 0 | $0.03 | 279.81s | 0% | 10 | 0 | $0.02 | 400.14s |
claude-4.5-sonnet-reasoning | 23.3% | 70% | 10 | 7 | $0.09 | 91.83s | 0% | 10 | 0 | $0.35 | 365.92s | 0% | 10 | 0 | $0.15 | 160.40s |
grok-4.1-fast-reasoning | 23.3% | 60% | 10 | 6 | $0.0033 | 47.29s | 10% | 10 | 1 | $0.02 | 295.44s | 0% | 10 | 0 | $0.03 | 390.37s |
minimax-m2.5 | 23.3% | 70% | 10 | 7 | $0.01 | 181.17s | 0% | 10 | 0 | $0.09 | 1481.82s | 0% | 10 | 0 | $0.06 | 732.98s |
glm-4.7-reasoning | 20.0% | 60% | 10 | 6 | $0.02 | 250.43s | 0% | 10 | 0 | $0.06 | 653.38s | 0% | 10 | 0 | $0.04 | 423.89s |
glm-5-reasoning | 20.0% | 60% | 10 | 6 | $0.02 | 209.77s | 0% | 10 | 0 | $0.10 | 770.82s | 0% | 10 | 0 | $0.08 | 736.45s |
minimax-m2.1 | 20.0% | 60% | 10 | 6 | $0.0093 | 150.83s | 0% | 10 | 0 | $0.03 | 313.90s | 0% | 10 | 0 | $0.02 | 367.21s |
mimo-v2-flash-high | 16.7% | 50% | 10 | 5 | $0.0000 | 155.63s | 0% | 10 | 0 | $0.0000 | 365.93s | 0% | 10 | 0 | $0.0000 | 572.42s |
glm-4.7-reasoning-high | 13.3% | 40% | 10 | 4 | $0.01 | 220.38s | 0% | 10 | 0 | $0.06 | 652.39s | 0% | 10 | 0 | $0.04 | 642.59s |
grok-4.1-fast-reasoning-high | 13.3% | 40% | 10 | 4 | $0.0031 | 43.43s | 0% | 10 | 0 | $0.02 | 349.43s | 0% | 10 | 0 | $0.02 | 262.72s |
seed-1.6-flash-high | 13.3% | 40% | 10 | 4 | $0.0042 | 89.14s | 0% | 10 | 0 | $0.0076 | 172.38s | 0% | 10 | 0 | $0.0083 | 183.87s |
olmo-3.1-32b-think | 10.0% | 30% | 10 | 3 | $0.0071 | 153.69s | 0% | 10 | 0 | $0.01 | 290.81s | 0% | 10 | 0 | $0.01 | 301.58s |
deepseek-v3.2 | 3.3% | 10% | 10 | 1 | $0.0004 | 28.16s | 0% | 10 | 0 | $0.0028 | 157.06s | 0% | 10 | 0 | $0.0008 | 40.96s |
kimi-k2 | 3.3% | 10% | 10 | 1 | $0.0051 | 105.92s | 0% | 10 | 0 | $0.01 | 101.82s | 0% | 10 | 0 | $0.0073 | 110.70s |
claude-4.5-sonnet-non-reasoning | 0.0% | 0% | 10 | 0 | $0.01 | 16.12s | 0% | 10 | 0 | $0.0095 | 10.42s | 0% | 10 | 0 | $0.0093 | 10.60s |
gemini-3-flash-preview-minimal | 0.0% | 0% | 10 | 0 | $0.0002 | 1.10s | 0% | 10 | 0 | $0.0005 | 1.36s | 0% | 10 | 0 | $0.0009 | 1.93s |
gemini-3-pro-preview-low | 0.0% | 0% | 10 | 0 | $0.0026 | 3.84s | 0% | 10 | 0 | $0.0040 | 4.49s | 0% | 10 | 0 | $0.0055 | 5.32s |
glm-4.7-non-reasoning | 0.0% | 0% | 10 | 0 | $0.0002 | 5.73s | 0% | 10 | 0 | $0.0003 | 9.51s | 0% | 10 | 0 | $0.0004 | 16.67s |
glm-5-non-reasoning | 0.0% | 0% | 10 | 0 | $0.0003 | 7.96s | 0% | 10 | 0 | $0.0004 | 11.57s | 0% | 10 | 0 | $0.0005 | 250.52s |
grok-4.1-fast-non-reasoning | 0.0% | 0% | 10 | 0 | $0.0001 | 2.16s | 0% | 10 | 0 | $0.0066 | 99.06s | 0% | 10 | 0 | $0.0009 | 11.48s |
kimi-k2.5-non-reasoning | 0.0% | 0% | 10 | 0 | $0.0002 | 1.63s | 0% | 10 | 0 | $0.0005 | 2.42s | 0% | 10 | 0 | $0.0006 | 4.06s |
mimo-v2-flash | 0.0% | 0% | 10 | 0 | $0.0000 | 10.98s | 0% | 10 | 0 | $0.0000 | 12.71s | 0% | 10 | 0 | $0.0000 | 91.02s |
ministral-14b-2512 | 0.0% | 0% | 10 | 0 | $0.0001 | 544ms | 0% | 10 | 0 | $0.0001 | 2.30s | 0% | 10 | 0 | $0.0008 | 39.92s |
mistral-large-2512 | 0.0% | 0% | 10 | 0 | $0.0002 | 1.13s | 0% | 10 | 0 | $0.0004 | 2.36s | 0% | 10 | 0 | $0.0010 | 7.24s |
Avg accuracy
56.2%
Solved
253/450
Avg time
118.24s
Avg cost
$0.02
Avg accuracy
12.0%
Solved
54/449
Avg time
457.31s
Avg cost
$0.12
Avg accuracy
4.7%
Solved
21/450
Avg time
394.90s
Avg cost
$0.11