Detecting and Preventing Hallucinations in Large Vision Language Models Paper • 2308.06394 • Published Aug 11, 2023
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains Paper • 2507.17746 • Published Jul 23, 2025 • 3
Agentic Rubrics as Contextual Verifiers for SWE Agents Paper • 2601.04171 • Published 8 days ago • 10
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Paper • 2509.16941 • Published Sep 21, 2025 • 21
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss Paper • 2309.14580 • Published Sep 26, 2023 • 1
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety Paper • 2506.14922 • Published Jun 17, 2025
Representation Engineering: A Top-Down Approach to AI Transparency Paper • 2310.01405 • Published Oct 2, 2023 • 7
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks Paper • 1910.01279 • Published Oct 3, 2019
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Paper • 2402.04249 • Published Feb 6, 2024 • 6