Uploaded finetuned model
- Developed by: Jackrong
- License: apache-2.0
- Finetuned from model : unsloth/Llama-3.1-8B-Instruct
- Note: Llama‑3.1‑8B‑Think‑Zero‑GRPO is a variant of the model where it was trained solely with Group Relative Policy Optimization (GRPO), using mathematics alone, and only after a tiny amount of cold‑start data. It is an intermediate version of Llama3.1‑8B‑Thinking‑R1.
- Downloads last month
- 109
Model tree for Jackrong/Llama-3.1-8B-Think-Zero-GRPO
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct
Finetuned
unsloth/Llama-3.1-8B-Instruct