BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper โข 2510.08697 โข Published Oct 9 โข 35
Running 36 36 BigCodeArena ๐ Compare two AI models by sending them code and seeing their responses