Improve model card: Add library, links, and usage example (#1)

Browse files

- Improve model card: Add library, links, and usage example (13e38d0c0e0720bc75e6f2b9681a760df4cb7919)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +33 -5

README.md CHANGED Viewed

@@ -1,24 +1,52 @@
 ---
 base_model:
 - allenai/OLMo-2-1124-7B-SFT
-license: apache-2.0
 datasets:
 - math
 metrics:
 - accuracy
 pipeline_tag: text-generation
-language:
-- en
 ---
 # OLMo-2-7B-SFT-GRPO-MATH-1EPOCH
-**Description:**
-A GRPO-fine-tuned version of allenai/OLMo-2-1124-7B-SFT trained on the MATH dataset.
 ---
 ## Citation
 ```bibtex

 ---
 base_model:
 - allenai/OLMo-2-1124-7B-SFT
 datasets:
 - math
+language:
+- en
+license: apache-2.0
 metrics:
 - accuracy
 pipeline_tag: text-generation
+library_name: transformers
 ---
 # OLMo-2-7B-SFT-GRPO-MATH-1EPOCH
+This model is a GRPO-fine-tuned version of `allenai/OLMo-2-1124-7B-SFT` trained on the MATH dataset.
+This model is associated with the paper [Learning to Reason without External Rewards](https://huggingface.co/papers/2505.19590), which introduces **Intuitor**, a reinforcement learning method that fine-tunes large language models (LLMs) using *self-certainty*—the model’s own internal confidence—as the sole reward. This approach is built on a novel paradigm called **Reinforcement Learning from Internal Feedback (RLIF)**, enabling models to learn without external rewards, gold labels, or verifiers by optimizing intrinsic signals.
 ---
+## Project Page & Code
+*   **Project Page**: [https://sunblaze-ucb.github.io/Intuitor/](https://sunblaze-ucb.github.io/Intuitor/)
+*   **GitHub Repository**: [https://github.com/sunblaze-ucb/Intuitor](https://github.com/sunblaze-ucb/Intuitor)
+## Usage
+You can load and use this model with the `transformers` library:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "sunblaze-ucb/OLMo-2-7B-SFT-GRPO-MATH-1EPOCH"
+# It's recommended to load with bfloat16 for OLMo-2 models if supported by your hardware
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
+# Example usage:
+prompt = "Question: What is 2 + 2?
+Answer:"
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids
+output = model.generate(input_ids, max_new_tokens=50, do_sample=False)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
 ## Citation
 ```bibtex