tau train hard no sys (user gpt4.1)
retail
β π Average Reward: 0.5702ββ π Pass^k Metrics:β k=1: 0.570β k=2: 0.439
-