12 42 58

Tong Zhu

Spico

https://Spico197.github.io

AI & ML interests

Information Extraction, Mixture-of-Experts, LLM

Recent Activity

liked a dataset 8 days ago

librarian-bots/paper-recommendations-v2

upvoted a paper 8 days ago

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

new activity 9 days ago

nvidia/Nemotron-Competitive-Programming-v1:User's content is empty in "competitive_coding_python"

View all activity

Organizations

liked a dataset 8 days ago

librarian-bots/paper-recommendations-v2

Viewer • Updated about 24 hours ago • 9.92k • 751 • 16

upvoted a paper 8 days ago

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Paper • 2601.18631 • Published 10 days ago • 47

New activity in nvidia/Nemotron-Competitive-Programming-v1 9 days ago

User's content is empty in "competitive_coding_python"

#1 opened 13 days ago by

uwesis

upvoted 3 papers 16 days ago

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Paper • 2601.11969 • Published 20 days ago • 26

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Paper • 2601.11655 • Published 21 days ago • 60

Toward Efficient Agents: Memory, Tool learning, and Planning

Paper • 2601.14192 • Published 16 days ago • 53

upvoted an article 22 days ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20, 2024

•

109

authored 7 papers about 1 month ago

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Paper • 2411.15708 • Published Nov 24, 2024

Iterative Value Function Optimization for Guided Decoding

Paper • 2503.02368 • Published Mar 4, 2025 • 15

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

Paper • 2503.05447 • Published Mar 7, 2025 • 8

Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

Paper • 2503.16779 • Published Mar 21, 2025 • 1

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Paper • 2406.11256 • Published Jun 17, 2024

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Paper • 2508.09834 • Published Aug 13, 2025 • 53

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 51

upvoted a paper about 1 month ago

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 51

submitted a paper to Daily Papers about 1 month ago

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 51

upvoted 2 papers 2 months ago

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 122

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 236

liked a Space 3 months ago

屎山文本墓园

📊

一个为文本项目建立“墓碑”的抽象装置，输入文本后将生成你的墓碑。

upvoted a paper 3 months ago

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 134

Tong Zhu

AI & ML interests

Recent Activity

Organizations

Spico's activity

User's content is empty in "competitive_coding_python"

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

屎山文本墓园