Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.20605

about 2 hours ago

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

about 20 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 456 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

about 13 hours ago

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

reasoning_model

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20 • 91
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Paper • 2509.04475 • Published Aug 30 • 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 27 days ago • 93

Representation & Optimization

Understanding about representation sheds light on optimization

about 1 hour ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31 • 1

nvidia/Nemotron-4-340B-Reward

Updated Jun 19, 2024 • 40 • 125
deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Text Generation • 8B • Updated Feb 24 • 961k • • 834
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

about 2 hours ago

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

about 13 hours ago

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

reasoning_model

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20 • 91
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Paper • 2509.04475 • Published Aug 30 • 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 27 days ago • 93

about 20 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 456 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Representation & Optimization

Understanding about representation sheds light on optimization

about 1 hour ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31 • 1

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

nvidia/Nemotron-4-340B-Reward

Updated Jun 19, 2024 • 40 • 125
deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Text Generation • 8B • Updated Feb 24 • 961k • • 834
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published 5 days ago • 44

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs