-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 237 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 258
Collections
Discover the best community collections!
Collections including paper arxiv:2507.16784
-
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 120 -
Extending Context Window of Large Language Models via Positional Interpolation
Paper • 2306.15595 • Published • 53 -
StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models
Paper • 2409.10132 • Published
-
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper • 2507.15629 • Published • 23 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 120 -
Step-Audio 2 Technical Report
Paper • 2507.16632 • Published • 72 -
HOComp: Interaction-Aware Human-Object Composition
Paper • 2507.16813 • Published • 12
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 12.3k • 53 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts
Paper • 2507.18464 • Published • 11 -
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Paper • 2507.16880 • Published • 6
-
Training Language Models to Generate Quality Code with Program Analysis Feedback
Paper • 2505.22704 • Published • 14 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 120 -
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Paper • 2507.16812 • Published • 63 -
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers
Paper • 2507.08422 • Published • 36
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 237 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 258
-
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 120 -
Extending Context Window of Large Language Models via Positional Interpolation
Paper • 2306.15595 • Published • 53 -
StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models
Paper • 2409.10132 • Published
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 12.3k • 53 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts
Paper • 2507.18464 • Published • 11 -
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Paper • 2507.16880 • Published • 6
-
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper • 2507.15629 • Published • 23 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 120 -
Step-Audio 2 Technical Report
Paper • 2507.16632 • Published • 72 -
HOComp: Interaction-Aware Human-Object Composition
Paper • 2507.16813 • Published • 12
-
Training Language Models to Generate Quality Code with Program Analysis Feedback
Paper • 2505.22704 • Published • 14 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 120 -
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Paper • 2507.16812 • Published • 63 -
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers
Paper • 2507.08422 • Published • 36