Zongmin Yu's picture

29 2

Zongmin Yu

zongmin-yu

·

zongmin-yu

AI & ML interests

None yet

Recent Activity

upvoted a paper about 20 hours ago

Qwen3-TTS Technical Report

upvoted a paper about 20 hours ago

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

upvoted a paper about 20 hours ago

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

View all activity

Organizations

upvoted 20 papers about 20 hours ago

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published 6 days ago • 47

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Paper • 2601.18137 • Published 2 days ago • 16

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Paper • 2601.07372 • Published 16 days ago • 38

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 28 days ago • 287

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published Nov 27, 2025 • 90

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 254

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14, 2025 • 75

DeepSeek-OCR: Contexts Optical Compression

Paper • 2510.18234 • Published Oct 21, 2025 • 90

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Paper • 2504.21801 • Published Apr 30, 2025 • 4

Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3, 2025 • 58

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Paper • 2501.17811 • Published Jan 29, 2025 • 8

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 167

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 437

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 75

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Paper • 2410.13848 • Published Oct 17, 2024 • 35

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Paper • 2412.10302 • Published Dec 13, 2024 • 22

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15, 2024 • 61

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Paper • 2407.01906 • Published Jul 2, 2024 • 46

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 43

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 68