SakanaAI/RLT-7B
Text Generation
•
8B
•
Updated
•
210
•
•
18
Students distilled from a 7B Reinforcement-Learned Teacher (RLT) from the paper "Reinforcement Learning Teachers of Test Time Scaling."