NanoSOTA-Qwen-0.5B-GSM8K-v1
Model Summary
This is a highly specialized version of Qwen/Qwen2.5-0.5B-Instruct, fine-tuned to excel at multi-step mathematical and logical reasoning. It was trained to first generate an internal monologue (<|begin_of_thought|>...) before providing a final, boxed answer (\boxed{...}).
This model is the result of a multi-stage training process designed to create a "NanoSOTA" (State-of-the-Art) model for its size class on reasoning tasks.
Intended Use
This model is intended for tasks requiring step-by-step reasoning, particularly elementary school math word problems (the GSM8K benchmark). It can be used for:
- Tutoring applications for math problems.
- Extracting structured answers from text.
- Studying the "Chain of Thought" capabilities of small language models.
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "oscarz511/NanoSOTA-Qwen-0.5B-GSM8K-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
prompt = "A farmer has 15 sheep. All but 8 die. How many are left?"
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs, skip_special_tokens=True))
Training Details
The model underwent a three-stage training process on an NVIDIA L40S GPU:
- Base Model:
Qwen/Qwen2.5-0.5B-Instruct. - Chain-of-Thought SFT: Fine-tuned on a subset of
HuggingFaceH4/Bespoke-Stratos-17kto learn the<|begin_of_thought|>reasoning structure. - Reinforcement Learning (GRPO): Briefly trained with a custom GRPO loop on synthetic arithmetic problems to improve mathematical accuracy.
- GSM8K Specialization: Heavily fine-tuned for 4 epochs on ~7,500 examples from the
gsm8k(main) dataset to master the format and logic of word problems.
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support