TianHongZXY/Qwen3-4B-Thinking-2507-SFT-10-epochs-synthesized-clear-problems-global_step_280 0.5B • Updated 3 days ago • 2
TianHongZXY/Qwen3-4B-Thinking-2507-SFT-10-epochs-synthesized-clear-problems-global_step_280 0.5B • Updated 3 days ago • 2
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30 • 54
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code Paper • 2508.18106 • Published Aug 25 • 342
TianHongZXY/Top_5_similar_question-NVIDIA-OpenScienceReasoning-2 Viewer • Updated Aug 28 • 2.16k • 4.83k
TianHongZXY/Top_5_similar_question-NVIDIA-OpenScienceReasoning-2 Viewer • Updated Aug 28 • 2.16k • 4.83k
RLVR-Decomposed Collection The collection for the Paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning" • 9 items • Updated Jun 1 • 2
TianHongZXY/OpenR1-Math-46k-8192-Qwen2.5-Math-7B-RoPE-40K-GRPO-use_guide-clip_ratio_upper_0.28 Updated Jul 12
TianHongZXY/OpenR1-Math-46k-8192-Qwen2.5-Math-7B-RoPE-40K-GRPO-use_guide-clip_ratio_upper_0.28 Updated Jul 12
TianHongZXY/OpenR1-Math-46k-8192-Qwen2.5-7B-Instruct-GRPO-gpt-4o-summary_wo_think-clip_0.28 Updated Jul 8