Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning Paper • 2509.09284 • Published Sep 11 • 2
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26 • 11
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26 • 11
Noise Contrastive Estimation-based Matching Framework for Low-resource Security Attack Pattern Recognition Paper • 2401.10337 • Published Jan 18, 2024 • 1
Noise Contrastive Estimation-based Matching Framework for Low-resource Security Attack Pattern Recognition Paper • 2401.10337 • Published Jan 18, 2024 • 1