Zhenxing Mi's picture

Zhenxing Mi

Mifucius

·

AI & ML interests

None yet

Recent Activity

liked a dataset 20 days ago

nvidia/PhysicalAI-Autonomous-Vehicles

upvoted a paper 29 days ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

liked a dataset 2 months ago

nvidia/vipe-wild-sdg-1m

View all activity

Organizations

None yet

upvoted a paper 29 days ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 87

upvoted a paper 4 months ago

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

upvoted a paper 5 months ago

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Paper • 2506.09350 • Published Jun 11 • 48

upvoted a paper 6 months ago

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 141

upvoted a paper 7 months ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published Apr 17 • 94

upvoted 4 papers 8 months ago

Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26 • 72

Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16 • 44

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Paper • 2503.14487 • Published Mar 18 • 27

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Paper • 2503.13434 • Published Mar 17 • 27

upvoted 5 papers 9 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 154

Dynamic Concepts Personalization from Single Videos

Paper • 2502.14844 • Published Feb 20 • 16

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Paper • 2502.13144 • Published Feb 18 • 38

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 210

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Paper • 2502.10458 • Published Feb 12 • 38

upvoted a paper 11 months ago

BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 35

upvoted 3 papers about 1 year ago

MM-Ego: Towards Building Egocentric Multimodal LLMs

Paper • 2410.07177 • Published Oct 9, 2024 • 22

Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published Oct 9, 2024 • 70

3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

Paper • 2410.01647 • Published Oct 2, 2024 • 31