Xuandong nielsr HF Staff commited on
Commit
e53a31f
·
verified ·
1 Parent(s): 904e401

Improve model card: Add library, links, and usage example (#1)

Browse files

- Improve model card: Add library, links, and usage example (13e38d0c0e0720bc75e6f2b9681a760df4cb7919)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +33 -5
README.md CHANGED
@@ -1,24 +1,52 @@
1
  ---
2
  base_model:
3
  - allenai/OLMo-2-1124-7B-SFT
4
- license: apache-2.0
5
  datasets:
6
  - math
 
 
 
7
  metrics:
8
  - accuracy
9
  pipeline_tag: text-generation
10
- language:
11
- - en
12
  ---
13
 
14
  # OLMo-2-7B-SFT-GRPO-MATH-1EPOCH
15
 
16
- **Description:**
17
 
18
- A GRPO-fine-tuned version of allenai/OLMo-2-1124-7B-SFT trained on the MATH dataset.
19
 
20
  ---
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Citation
23
 
24
  ```bibtex
 
1
  ---
2
  base_model:
3
  - allenai/OLMo-2-1124-7B-SFT
 
4
  datasets:
5
  - math
6
+ language:
7
+ - en
8
+ license: apache-2.0
9
  metrics:
10
  - accuracy
11
  pipeline_tag: text-generation
12
+ library_name: transformers
 
13
  ---
14
 
15
  # OLMo-2-7B-SFT-GRPO-MATH-1EPOCH
16
 
17
+ This model is a GRPO-fine-tuned version of `allenai/OLMo-2-1124-7B-SFT` trained on the MATH dataset.
18
 
19
+ This model is associated with the paper [Learning to Reason without External Rewards](https://huggingface.co/papers/2505.19590), which introduces **Intuitor**, a reinforcement learning method that fine-tunes large language models (LLMs) using *self-certainty*—the model’s own internal confidence—as the sole reward. This approach is built on a novel paradigm called **Reinforcement Learning from Internal Feedback (RLIF)**, enabling models to learn without external rewards, gold labels, or verifiers by optimizing intrinsic signals.
20
 
21
  ---
22
 
23
+ ## Project Page & Code
24
+
25
+ * **Project Page**: [https://sunblaze-ucb.github.io/Intuitor/](https://sunblaze-ucb.github.io/Intuitor/)
26
+ * **GitHub Repository**: [https://github.com/sunblaze-ucb/Intuitor](https://github.com/sunblaze-ucb/Intuitor)
27
+
28
+ ## Usage
29
+
30
+ You can load and use this model with the `transformers` library:
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ import torch
35
+
36
+ model_name = "sunblaze-ucb/OLMo-2-7B-SFT-GRPO-MATH-1EPOCH"
37
+
38
+ # It's recommended to load with bfloat16 for OLMo-2 models if supported by your hardware
39
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
40
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
41
+
42
+ # Example usage:
43
+ prompt = "Question: What is 2 + 2?
44
+ Answer:"
45
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
46
+ output = model.generate(input_ids, max_new_tokens=50, do_sample=False)
47
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
48
+ ```
49
+
50
  ## Citation
51
 
52
  ```bibtex