Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement Paper • 2305.01481 • Published May 2, 2023
GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks Paper • 2306.00015 • Published May 30, 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs Paper • 2306.13063 • Published Jun 22, 2023
Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results Paper • 2504.13677 • Published Apr 18 • 1
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging Paper • 2505.05464 • Published May 8 • 11
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research Paper • 2505.19955 • Published May 26 • 13