neutrino12
's Collections
Snowflake/Arctic-Text2SQL-R1-7B
8B
•
Updated
•
6.27k
•
52
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
274
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
262
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
•
2506.16406
•
Published
•
126
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
•
2506.06395
•
Published
•
131
Paper
•
2505.09388
•
Published
•
308
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
•
2505.17667
•
Published
•
88
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
•
2507.16784
•
Published
•
120
Group Sequence Policy Optimization
Paper
•
2507.18071
•
Published
•
307
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper
•
2507.13546
•
Published
•
123
Paper
•
2507.15493
•
Published
•
47
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
•
2507.14958
•
Published
•
46
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
•
2507.15758
•
Published
•
35
Complex Logical Instruction Generation
Paper
•
2508.09125
•
Published
•
39
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings
and Speaks in Tokens
Paper
•
2508.05305
•
Published
•
46
PRELUDE: A Benchmark Designed to Require Global Comprehension and
Reasoning over Long Contexts
Paper
•
2508.09848
•
Published
•
67
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Paper
•
2508.08940
•
Published
•
26
Learning to Align, Aligning to Learn: A Unified Approach for
Self-Optimized Alignment
Paper
•
2508.07750
•
Published
•
19
Pruning the Unsurprising: Efficient Code Reasoning via First-Token
Surprisal
Paper
•
2508.05988
•
Published
•
19
Less Is More: Training-Free Sparse Attention with Global Locality for
Efficient Reasoning
Paper
•
2508.07101
•
Published
•
13
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper
•
2508.03346
•
Published
•
7
Sample More to Think Less: Group Filtered Policy Optimization for
Concise Reasoning
Paper
•
2508.09726
•
Published
•
14
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
•
2508.01191
•
Published
•
236
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
•
2508.05629
•
Published
•
178
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
•
2508.05004
•
Published
•
126
Story2Board: A Training-Free Approach for Expressive Storyboard
Generation
Paper
•
2508.09983
•
Published
•
68
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning
Models
Paper
•
2508.02120
•
Published
•
19
Beyond the Trade-off: Self-Supervised Reinforcement Learning for
Reasoning Models' Instruction Following
Paper
•
2508.02150
•
Published
•
36
Trainable Dynamic Mask Sparse Attention
Paper
•
2508.02124
•
Published
•
17
Can Large Multimodal Models Actively Recognize Faulty Inputs? A
Systematic Evaluation Framework of Their Input Scrutiny Ability
Paper
•
2508.04017
•
Published
•
11
Deep Think with Confidence
Paper
•
2508.15260
•
Published
•
87
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical
Register Indexing
Paper
•
2508.11116
•
Published
•
22
Efficient Code Embeddings from Code Generation Models
Paper
•
2508.21290
•
Published
•
19
Model-Task Alignment Drives Distinct RL Outcomes
Paper
•
2508.21188
•
Published
•
8
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
•
2508.20751
•
Published
•
89
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
Long-Context Learning
Paper
•
2508.18756
•
Published
•
36
Hermes 4 Technical Report
Paper
•
2508.18255
•
Published
•
39
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Paper
•
2508.19229
•
Published
•
20
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
•
2508.18773
•
Published
•
15
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Paper
•
2508.18076
•
Published
•
6
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
•
2508.16072
•
Published
•
4
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H)
Agent Unlocks Adversarial Skills
Paper
•
2508.19500
•
Published
•
2
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated
Chain-of-Thought-based Reinforced Fine-Tuning
Paper
•
2508.15868
•
Published
•
3
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Paper
•
2508.10390
•
Published
•
1
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
•
2509.00676
•
Published
•
83
DCPO: Dynamic Clipping Policy Optimization
Paper
•
2509.02333
•
Published
•
21
When Does Reasoning Matter? A Controlled Study of Reasoning's
Contribution to Model Performance
Paper
•
2509.22193
•
Published
•
37
What Characterizes Effective Reasoning? Revisiting Length, Review, and
Structure of CoT
Paper
•
2509.19284
•
Published
•
22