Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
111
23
239
VORTEX
Abhaykoul
Follow
DIVY118's profile picture
shtefcs's profile picture
E5Anant's profile picture
128 followers
·
38 following
OEvortex
AI & ML interests
None yet
Recent Activity
updated
a model
12 days ago
Abhaykoul/ARC-HAIRM
published
a model
12 days ago
Abhaykoul/ARC-HAIRM
reacted
to
KingNish
's
post
with 🔥
18 days ago
Muon vs MuonClip vs Muon+Adamw Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out. Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW. Takeaway: for small-scale fine-tuning, hybrid = practical and reliable. Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out. Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
View all activity
Organizations
Abhaykoul
's models
11
Sort: Recently updated
Abhaykoul/ARC-HAIRM
Text Generation
•
Updated
12 days ago
Abhaykoul/HelpingAI2.5-prototype-v2
Text Generation
•
9B
•
Updated
Oct 20, 2024
•
8
•
2
Abhaykoul/Abhayjr
Text Generation
•
6B
•
Updated
Aug 29, 2024
•
1
Abhaykoul/emo-face-rec
85.8M
•
Updated
Aug 19, 2024
•
10
•
1
Abhaykoul/HelpingAI2-4x6B
Text Generation
•
17B
•
Updated
Aug 6, 2024
•
28
•
1
Abhaykoul/Friday-Latest
Text Generation
•
3B
•
Updated
Jul 31, 2024
•
12
Abhaykoul/Wise-Qwen
Text Generation
•
2B
•
Updated
Jun 11, 2024
•
7
Abhaykoul/EI-gemma
Text Generation
•
3B
•
Updated
May 6, 2024
•
10
Abhaykoul/idefics-9b-doodles
Any-to-Any
•
9B
•
Updated
Apr 5, 2024
•
7
Abhaykoul/HelpingAI-Lite-4x1b
Text Generation
•
3B
•
Updated
Mar 19, 2024
•
58
Abhaykoul/Qwen1.5-0.5B-vortex
Text Generation
•
0.5B
•
Updated
Mar 12, 2024
•
65
•
2