Uploaded finetuned model

Developed by: Jackrong
License: apache-2.0
Finetuned from model : unsloth/Llama-3.1-8B-Instruct
Note: Llama‑3.1‑8B‑Think‑Zero‑GRPO is a variant of the model where it was trained solely with Group Relative Policy Optimization (GRPO), using mathematics alone, and only after a tiny amount of cold‑start data. It is an intermediate version of Llama3.1‑8B‑Thinking‑R1.

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Jackrong/Llama-3.1-8B-Think-Zero-GRPO

Base model

Finetuned

Finetuned

Finetuned

(91)

this model