Ai-general - a mkimitch Collection

mkimitch 's Collections

Ai-general

updated about 15 hours ago

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 8 days ago • 48
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 266
RLP: Reinforcement as a Pretraining Objective

Paper • 2510.01265 • Published Sep 26 • 40
Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1 • 58
LiveTradeBench: Seeking Real-World Alpha with Large Language Models

Paper • 2511.03628 • Published Nov 5 • 11
PromptBridge: Cross-Model Prompt Transfer for Large Language Models

Paper • 2512.01420 • Published 9 days ago • 8
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Paper • 2510.09577 • Published Oct 10 • 7
Diversity Has Always Been There in Your Visual Autoregressive Models

Paper • 2511.17074 • Published 19 days ago • 7
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published 23 days ago • 134
Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Paper • 2510.18821 • Published Oct 21 • 17
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26 • 57
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published Oct 22 • 114
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Paper • 2511.16043 • Published 20 days ago • 105
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Paper • 2510.03561 • Published Oct 3 • 24
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Paper • 2509.08755 • Published Sep 10 • 56
gpt-oss-120b & gpt-oss-20b Model Card

Paper • 2508.10925 • Published Aug 8 • 12
OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 36
Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published 8 days ago • 15
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

Paper • 2511.09067 • Published 28 days ago • 2
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Paper • 2510.23038 • Published Oct 27
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published about 1 month ago • 12
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation

Paper • 2511.15958 • Published 21 days ago • 1
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering

Paper • 2511.19899 • Published 15 days ago
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Paper • 2512.05150 • Published 7 days ago • 59