Nahid Alam's picture

Nahid Alam

nahidalam

·

AI & ML interests

multi-modal systems

Recent Activity

liked a model 5 days ago

nvidia/Alpamayo-R1-10B

liked a dataset 3 months ago

Grigorij/shooting_watermelon_v3.0

liked a Space 4 months ago

HuggingFaceTB/smol-training-playbook

View all activity

Organizations

upvoted 2 papers 6 months ago

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Paper • 2508.05635 • Published Aug 7, 2025 • 73

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Paper • 2507.20198 • Published Jul 27, 2025 • 28

upvoted an article 10 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7, 2025

•

276

upvoted a paper 10 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 205

upvoted a paper about 1 year ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4, 2025 • 66

upvoted a collection almost 2 years ago

Vision Models (GGUF)

How to use: Download a "mmproj" model file + one or more of the primary model files. • 5 items • Updated Dec 22, 2023 • 45

upvoted 2 papers almost 2 years ago

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91

upvoted a paper over 2 years ago

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40