RandomHakkaDude 's Collections LLMs&Agents
updated
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published
• 148
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence
Generation up to 100K Tokens
Paper
• 2502.18890
• Published
• 30
MPO: Boosting LLM Agents with Meta Plan Optimization
Paper
• 2503.02682
• Published
• 29
SWE-rebench: An Automated Pipeline for Task Collection and
Decontaminated Evaluation of Software Engineering Agents
Paper
• 2505.20411
• Published
• 93
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation
Sandbox for Deep Research
Paper
• 2505.19253
• Published
• 34
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for
Frozen LLMs
Paper
• 2505.19075
• Published
• 21
Text2Grad: Reinforcement Learning from Natural Language Feedback
Paper
• 2505.22338
• Published
• 8
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic
Scientific Workflows
Paper
• 2505.19897
• Published
• 104
Paper2Poster: Towards Multimodal Poster Automation from Scientific
Papers
Paper
• 2505.21497
• Published
• 109
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering
Target Atoms
Paper
• 2505.20322
• Published
• 14
VideoGameBench: Can Vision-Language Models complete popular video games?
Paper
• 2505.18134
• Published
• 6
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal
Predefinition and Maximal Self-Evolution
Paper
• 2505.20286
• Published
• 8
ARM: Adaptive Reasoning Model
Paper
• 2505.20258
• Published
• 45
Flex-Judge: Think Once, Judge Anywhere
Paper
• 2505.18601
• Published
• 27
Lifelong Safety Alignment for Language Models
Paper
• 2505.20259
• Published
• 24
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications
of Agentic AI
Paper
• 2505.19443
• Published
• 15
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer
Interaction
Paper
• 2505.10887
• Published
• 10