TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24 • 80
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper • 2403.17031 • Published Mar 24, 2024 • 6
TIGERScore Collection List of model variates of TIGEREScore checkpoints and the associated dataset • 8 items • Updated Sep 26, 2024 • 5
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published May 29, 2024 • 48
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2, 2024 • 107