Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Sourashis Bhowmik's picture
1

Sourashis Bhowmik

SourBNulink
Ā·

AI & ML interests

AI Dev and Deployment

Recent Activity

replied to Hellohal2064's post 8 days ago
šŸš€ Excited to share: The vLLM container for NVIDIA DGX Spark! I've been working on getting vLLM to run natively on the new DGX Spark with its GB10 Blackwell GPU (SM121 architecture). The results? 2.5x faster inference compared to llama.cpp! šŸ“Š Performance Highlights: • Qwen3-Coder-30B: 44 tok/s (vs 21 tok/s with llama.cpp) • Qwen3-Next-80B: 45 tok/s (vs 18 tok/s with llama.cpp) šŸ”§ Technical Challenges Solved: • Built PyTorch nightly with CUDA 13.1 + SM121 support • Patched vLLM for Blackwell architecture • Created custom MoE expert configs for GB10 • Implemented TRITON_ATTN backend workaround šŸ“¦ Available now: • Docker Hub: docker pull hellohal2064/vllm-dgx-spark-gb10:latest • HuggingFace: huggingface.co/Hellohal2064/vllm-dgx-spark-gb10 The DGX Spark's 119GB unified memory opens up possibilities for running massive models locally. Happy to connect with others working on the DGX Spark Blackwell!
new activity 2 months ago
NexaAI/Qwen3-VL-2B-Thinking-GGUF:Model Failed to Load. (Linux arm64, CUDA 12.6 on NVIDIA GB10 Blackwell GPU)
View all activity

Organizations

None yet

SourBNulink 's models

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs