Sherlock's picture

7 5 10

Sherlock

eyuansu71

·

https://scholar.google.com/citations?user=75pkx3YAAAAJ&hl=en

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

upvoted a paper about 2 months ago

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

upvoted a paper about 2 months ago

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

View all activity

Organizations

upvoted a paper 6 days ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Paper • 2510.26865 • Published 10 days ago • 11

upvoted 2 papers about 2 months ago

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21 • 20

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

Paper • 2509.17177 • Published Sep 21 • 13

upvoted a paper 3 months ago

Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

Paper • 2508.11252 • Published Aug 15 • 3

upvoted a paper 4 months ago

SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

Paper • 2506.18951 • Published Jun 23 • 21