Paul S PRO
SuperPauly
AI & ML interests
None yet
Recent Activity
updated
a collection
1 day ago
Demixing Models & Datasets liked
a Space 2 days ago
qtzmusic/UVR5_UI updated
a collection
2 days ago
Demixing Models & Datasets Organizations
None yet
Agent Loops, Character, Work Ethics & Behavior
-
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing
Paper • 2512.23611 • Published • 5 -
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Paper • 2512.22087 • Published • 3 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 55 -
Very Large-Scale Multi-Agent Simulation in AgentScope
Paper • 2407.17789 • Published • 37
Evaluation Methods & Metrics
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 53 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 109 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 93 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 76
Py
Demixing Models & Datasets
Evaluation Methods & Metrics
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 53 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 109 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 93 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 76
Agent Loops, Character, Work Ethics & Behavior
-
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing
Paper • 2512.23611 • Published • 5 -
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Paper • 2512.22087 • Published • 3 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 55 -
Very Large-Scale Multi-Agent Simulation in AgentScope
Paper • 2407.17789 • Published • 37
Py