allenai
/

llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm

Safetensors

English

llama

Model card Files Files and versions

xet

Community

hamishivi commited on Oct 14, 2024

Commit

5ad851d

verified ·

1 Parent(s): 6d52b05

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 model-index:
-- name: tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm
   results: []
 datasets:
 - allenai/tulu-2.5-preference-data
@@ -14,7 +14,7 @@ license: apache-2.0
 <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-2.5/tulu_25_banner.png" alt="Tulu 2.5 banner image" width="800px"/>
 </center>
-# Model Card for Tulu V2.5 PPO 13B - UltraFeedback Mean w. 8B UltraFeedback RM
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
@@ -22,13 +22,14 @@ This model is trained on the UltraFeedback dataset (using the per-aspect/fine-gr
 We used a 8B RM trained on the UltraFeedback dataset, and then used the UltraFeedback prompts during PPO training.
 This is part of a small update to the original V2.5 suite, adding some Llama 3-based models. We add three models:
-- [allenai/tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm) (this model)
-- [allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts)
-- [allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts) (best overall model)
 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
 ## .Model description

 ---
 model-index:
+- name: llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm
   results: []
 datasets:
 - allenai/tulu-2.5-preference-data
 <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-2.5/tulu_25_banner.png" alt="Tulu 2.5 banner image" width="800px"/>
 </center>
+# Model Card for Llama 3 Tulu V2.5 PPO 13B - UltraFeedback Mean w. 8B UltraFeedback RM
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 We used a 8B RM trained on the UltraFeedback dataset, and then used the UltraFeedback prompts during PPO training.
 This is part of a small update to the original V2.5 suite, adding some Llama 3-based models. We add three models:
+- [allenai/llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm) (this model)
+- [allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts)
+- [allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts) (best overall model)
 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
+Built with Meta Llama 3! Note that Llama 3 is released under the Meta Llama 3 community license, included here under llama_3_license.txt.
 ## .Model description