Config Leaderboard

#ConfigAvg ELOWin rateW / LOutputs
1
single Gemini 2.5 Flash
1441
88%
46 / 614
2
pipeline Gemini 2.5 Flash → Claude Sonnet 4
1426
72%
41 / 1614
3
single GPT 5 Nano
1417
65%
36 / 1914
4
single Claude Sonnet 4
1416
63%
33 / 1914
5
pipeline GPT-4o Mini Vision → GPT-4o Copy
1415
67%
30 / 1514
6
single Qwen3.5-9B
1393
45%
30 / 3614
7
single Mistral Small 4
1363
24%
16 / 5114
8
single GPT-4o
1328
1%
1 / 7114

Ranked by average ELO across all products and outputs