Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability Paper • 2508.04017 • Published Aug 6 • 11
Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models Paper • 2505.23715 • Published May 29 • 1
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following Paper • 2502.14494 • Published Feb 20 • 15
Large Language Model Evaluation via Matrix Nuclear-Norm Paper • 2410.10672 • Published Oct 14, 2024 • 19