hkust-nlp/Qwen-2.5-7B-Verifier-general-verifier Reinforcement Learning • 8B • Updated May 28, 2025 • 3
Xuerui2312/DeepSeek-R1-Distill-Qwen-7B-TRPA-DeepScaleR-verl0326 Text Generation • 8B • Updated Jun 20, 2025 • 18 • 1
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc Text Generation • 8B • Updated Jun 15, 2025 • 3
tensorblock/Vinnnf_Thinkless-1.5B-RL-DeepScaleR-GGUF Text Generation • 2B • Updated Jul 9, 2025 • 117
hdong0/deepseek-Qwen2.5-1.5B-baseline-Open-R1-GRPO_deepscaler_mu_8 Text Generation • 2B • Updated Jul 4, 2025 • 3
hdong0/deepseek-Qwen2.5-1.5B-Open-R1-GRPO_deepscaler_mu_8 Text Generation • 2B • Updated Jul 4, 2025 • 1
hdong0/Qwen2.5-Math-1.5B-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Jul 7, 2025 • 3
hdong0/deepseek-Qwen-1.5B-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Jul 7, 2025 • 2
hdong0/Qwen2.5-Math-1.5B-baseline-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Jul 8, 2025 • 2