Config Leaderboard
| # | Config | Avg ELO | Win rate | W / L | Outputs |
|---|---|---|---|---|---|
| 1 | single Gemini 2.5 Flash | 1441 | 88% | 46 / 6 | 14 |
| 2 | pipeline Gemini 2.5 Flash → Claude Sonnet 4 | 1426 | 72% | 41 / 16 | 14 |
| 3 | single GPT 5 Nano | 1417 | 65% | 36 / 19 | 14 |
| 4 | single Claude Sonnet 4 | 1416 | 63% | 33 / 19 | 14 |
| 5 | pipeline GPT-4o Mini Vision → GPT-4o Copy | 1415 | 67% | 30 / 15 | 14 |
| 6 | single Qwen3.5-9B | 1393 | 45% | 30 / 36 | 14 |
| 7 | single Mistral Small 4 | 1363 | 24% | 16 / 51 | 14 |
| 8 | single GPT-4o | 1328 | 1% | 1 / 71 | 14 |
Ranked by average ELO across all products and outputs