Update README.md
Browse files
README.md
CHANGED
|
@@ -1,18 +1,23 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
| 4 |
-
|
| 5 |
-
Generative Reward Model trained with [FAPO-Critic](https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic)
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
Project Homepage: https://fapo-rl.github.io/
|
| 10 |
-
|
| 11 |
-
Code Implementation: https://github.com/volcengine/verl/tree/main/recipe/fapo
|
| 12 |
-
|
| 13 |
-
Welcome to follow and cite our works!
|
| 14 |
-
|
| 15 |
-
BibTeX citation:
|
| 16 |
-
```bibtex
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
Generative Reward Model trained with [FAPO-Critic](https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic)
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
Project Homepage: https://fapo-rl.github.io/
|
| 10 |
+
|
| 11 |
+
Code Implementation: https://github.com/volcengine/verl/tree/main/recipe/fapo
|
| 12 |
+
|
| 13 |
+
Welcome to follow and cite our works!
|
| 14 |
+
|
| 15 |
+
BibTeX citation:
|
| 16 |
+
```bibtex
|
| 17 |
+
@article{ding2025fapo,
|
| 18 |
+
title={FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning},
|
| 19 |
+
author={Ding, Yuyang and Zhang, Chi and Li, Juntao and Lin, Haibin and Liu, Xin and Zhang, Min},
|
| 20 |
+
journal={arXiv preprint arXiv:2510.22543},
|
| 21 |
+
year={2025}
|
| 22 |
+
}
|
| 23 |
+
```
|