NanoSOTA-Qwen-0.5B-GSM8K-v1

Model Summary

This is a highly specialized version of Qwen/Qwen2.5-0.5B-Instruct, fine-tuned to excel at multi-step mathematical and logical reasoning. It was trained to first generate an internal monologue (<|begin_of_thought|>...) before providing a final, boxed answer (\boxed{...}).

This model is the result of a multi-stage training process designed to create a "NanoSOTA" (State-of-the-Art) model for its size class on reasoning tasks.

Intended Use

This model is intended for tasks requiring step-by-step reasoning, particularly elementary school math word problems (the GSM8K benchmark). It can be used for:

Tutoring applications for math problems.
Extracting structured answers from text.
Studying the "Chain of Thought" capabilities of small language models.

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "oscarz511/NanoSOTA-Qwen-0.5B-GSM8K-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

prompt = "A farmer has 15 sheep. All but 8 die. How many are left?"

messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs, skip_special_tokens=True))

Training Details

The model underwent a three-stage training process on an NVIDIA L40S GPU:

Base Model: Qwen/Qwen2.5-0.5B-Instruct.
Chain-of-Thought SFT: Fine-tuned on a subset of HuggingFaceH4/Bespoke-Stratos-17k to learn the <|begin_of_thought|> reasoning structure.
Reinforcement Learning (GRPO): Briefly trained with a custom GRPO loop on synthetic arithmetic problems to improve mathematical accuracy.
GSM8K Specialization: Heavily fine-tuned for 4 epochs on ~7,500 examples from the gsm8k (main) dataset to master the format and logic of word problems.

Downloads last month: 25

Safetensors

Model size

0.5B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oscarz511/NanoSOTA-Qwen-0.5B-GSM8K-v1

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(578)

this model

oscarz511
/

NanoSOTA-Qwen-0.5B-GSM8K-v1