Update README.md
Browse files
README.md
CHANGED
|
@@ -10,4 +10,6 @@ datasets:
|
|
| 10 |
|
| 11 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 12 |
|
| 13 |
-
Acc Qwen 4B is a state of the art accessibility GRPO RL trained model with RM_R1 style Chain of Rubric distsillation of Claude 4 Opus using Gemini 2.5 Flash to Qwen 3 4B over 18 million tokens.
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 12 |
|
| 13 |
+
Acc Qwen 4B is a state of the art accessibility GRPO RL trained model with RM_R1 style Chain of Rubric distsillation of Claude 4 Opus using Gemini 2.5 Flash to Qwen 3 4B over 18 million tokens.
|
| 14 |
+
|
| 15 |
+
The code for training the model is at https://github.com/Nottlespike/Accessible_Qwen
|