Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
model-index:
|
| 3 |
-
- name: tulu-v2.5-
|
| 4 |
results: []
|
| 5 |
datasets:
|
| 6 |
- allenai/tulu-2.5-preference-data
|
|
@@ -14,7 +14,7 @@ license: apache-2.0
|
|
| 14 |
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-2.5/tulu_25_banner.png" alt="Tulu 2.5 banner image" width="800px"/>
|
| 15 |
</center>
|
| 16 |
|
| 17 |
-
# Model Card for Tulu V2.5 PPO 13B - UltraFeedback Mean w. 8B UltraFeedback RM
|
| 18 |
|
| 19 |
Tulu is a series of language models that are trained to act as helpful assistants.
|
| 20 |
Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
|
|
@@ -22,13 +22,14 @@ This model is trained on the UltraFeedback dataset (using the per-aspect/fine-gr
|
|
| 22 |
We used a 8B RM trained on the UltraFeedback dataset, and then used the UltraFeedback prompts during PPO training.
|
| 23 |
|
| 24 |
This is part of a small update to the original V2.5 suite, adding some Llama 3-based models. We add three models:
|
| 25 |
-
- [allenai/tulu-v2.5-
|
| 26 |
-
- [allenai/tulu-v2.5-
|
| 27 |
-
- [allenai/tulu-v2.5-
|
| 28 |
|
| 29 |
For more details, read the paper:
|
| 30 |
[Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
## .Model description
|
| 34 |
|
|
|
|
| 1 |
---
|
| 2 |
model-index:
|
| 3 |
+
- name: llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm
|
| 4 |
results: []
|
| 5 |
datasets:
|
| 6 |
- allenai/tulu-2.5-preference-data
|
|
|
|
| 14 |
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-2.5/tulu_25_banner.png" alt="Tulu 2.5 banner image" width="800px"/>
|
| 15 |
</center>
|
| 16 |
|
| 17 |
+
# Model Card for Llama 3 Tulu V2.5 PPO 13B - UltraFeedback Mean w. 8B UltraFeedback RM
|
| 18 |
|
| 19 |
Tulu is a series of language models that are trained to act as helpful assistants.
|
| 20 |
Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
|
|
|
|
| 22 |
We used a 8B RM trained on the UltraFeedback dataset, and then used the UltraFeedback prompts during PPO training.
|
| 23 |
|
| 24 |
This is part of a small update to the original V2.5 suite, adding some Llama 3-based models. We add three models:
|
| 25 |
+
- [allenai/llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm) (this model)
|
| 26 |
+
- [allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts)
|
| 27 |
+
- [allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts) (best overall model)
|
| 28 |
|
| 29 |
For more details, read the paper:
|
| 30 |
[Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
|
| 31 |
|
| 32 |
+
Built with Meta Llama 3! Note that Llama 3 is released under the Meta Llama 3 community license, included here under llama_3_license.txt.
|
| 33 |
|
| 34 |
## .Model description
|
| 35 |
|