Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
raincandy-u 
posted an update 4 days ago
Post
5151
🤗 Just released Rain-100M, an experimental ~97M-parameter Qwen3-style language model trained from random initialization.

Repo: raincandy-u/Rain-100M

Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only

Tokenizer: custom 16k BPE, context length 4096

Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16


Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!

It’s great to see new tiny models coming up!
Added it to the Foundation Text-Generation Models Below 360M Parameters collection!

interesting

If I spend enough time, I should be able make a bot from scratch that's 20 times smaller, using the same code and structure. Why am I saying this? Am I going to do it? No, but everyone should know that most of what people do is throwing a ton of information at bots to work out for themselves algorithmically. We need more experimental bots, particularly to skip a few steps toward getting the same answer. So, I'm always glad to see work of this sort, whether it's trying different datasets with different LLMs, or whatever.