VORTEX's picture

Building on HF

VORTEX

Abhaykoul

·

OEvortex

AI & ML interests

None yet

Recent Activity

updated a model 12 days ago

Abhaykoul/ARC-HAIRM

published a model 12 days ago

Abhaykoul/ARC-HAIRM

reacted to KingNish's post with 🔥 18 days ago

Muon vs MuonClip vs Muon+Adamw Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out. Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW. Takeaway: for small-scale fine-tuning, hybrid = practical and reliable. Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out. Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

View all activity

Organizations

Abhaykoul 's models 11

Abhaykoul/ARC-HAIRM

Text Generation • Updated 12 days ago

Abhaykoul/HelpingAI2.5-prototype-v2

Text Generation • 9B • Updated Oct 20, 2024 • 8 • 2

Abhaykoul/Abhayjr

Text Generation • 6B • Updated Aug 29, 2024 • 1

Abhaykoul/emo-face-rec

85.8M • Updated Aug 19, 2024 • 10 • 1

Abhaykoul/HelpingAI2-4x6B

Text Generation • 17B • Updated Aug 6, 2024 • 28 • 1

Abhaykoul/Friday-Latest

Text Generation • 3B • Updated Jul 31, 2024 • 12

Abhaykoul/Wise-Qwen

Text Generation • 2B • Updated Jun 11, 2024 • 7

Abhaykoul/EI-gemma

Text Generation • 3B • Updated May 6, 2024 • 10

Abhaykoul/idefics-9b-doodles

Any-to-Any • 9B • Updated Apr 5, 2024 • 7

Abhaykoul/HelpingAI-Lite-4x1b

Text Generation • 3B • Updated Mar 19, 2024 • 58

Abhaykoul/Qwen1.5-0.5B-vortex

Text Generation • 0.5B • Updated Mar 12, 2024 • 65 • 2