Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
THU-KEG
's Collections
LLaDA-8B-BGPO
DeepPrune
SIRI
VerIF
AdaptThink
LongWriter-V
OpenSAE-LLaMA-3.1-8B
Crab
ADELIE
LLaDA-8B-BGPO
updated
Oct 11
Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
Upvote
4
THU-KEG/LLaDA-8B-BGPO-math
Reinforcement Learning
•
8B
•
Updated
Oct 14
•
9
•
1
THU-KEG/LLaDA-8B-BGPO-code
Reinforcement Learning
•
8B
•
Updated
Oct 14
•
9
•
1
THU-KEG/LLaDA-8B-BGPO-countdown
Reinforcement Learning
•
8B
•
Updated
Oct 14
•
2
•
1
THU-KEG/LLaDA-8B-BGPO-sudoku
Reinforcement Learning
•
8B
•
Updated
Oct 14
•
4
•
1
Upvote
4
Share collection
View history
Collection guide
Browse collections