Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization Paper • 2510.25616 • Published 13 days ago • 89
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6 • 112
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling Paper • 2508.16745 • Published Aug 22 • 28
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 131
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27 • 139
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Paper • 2506.05229 • Published Jun 5 • 38
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 91
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20 • 174
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published Feb 18 • 72
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper • 2412.06531 • Published Dec 9, 2024 • 72
Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task Paper • 2406.14213 • Published Jun 20, 2024 • 21
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published Jun 14, 2024 • 52
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16, 2024 • 42