Uploaded finetuned model

  • Developed by: Jackrong
  • License: apache-2.0
  • Finetuned from model : unsloth/Llama-3.1-8B-Instruct
  • Note: Llama‑3.1‑8B‑Think‑Zero‑GRPO is a variant of the model where it was trained solely with Group Relative Policy Optimization (GRPO), using mathematics alone, and only after a tiny amount of cold‑start data. It is an intermediate version of Llama3.1‑8B‑Thinking‑R1.
Downloads last month
109
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jackrong/Llama-3.1-8B-Think-Zero-GRPO

Finetuned
(91)
this model