tencent/DRIVE-RL
Text Generation
•
33B
•
Updated
•
5
None defined yet.
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains