Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles Paper • 2309.09369 • Published Sep 17, 2023 • 1
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Paper • 2312.10160 • Published Dec 15, 2023 • 2
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments Paper • 2411.02305 • Published Nov 4, 2024 • 1
Evaluating Cultural and Social Awareness of LLM Web Agents Paper • 2410.23252 • Published Oct 30, 2024 • 1
NewsEdits 2.0: Learning the Intentions Behind Updating News Paper • 2411.18811 • Published Nov 27, 2024
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding Paper • 2502.11492 • Published Feb 17 • 2
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions Paper • 2505.18878 • Published May 24 • 2
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models Paper • 2403.12027 • Published Mar 18, 2024
Benchmarking Deep Search over Heterogeneous Enterprise Data Paper • 2506.23139 • Published Jun 29
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness Paper • 2510.00536 • Published Oct 1 • 6
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence Paper • 2509.04499 • Published Sep 2
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion Paper • 2510.22768 • Published 14 days ago • 7