justinj92 lbourdois commited on
Commit
3a2afe7
·
verified ·
1 Parent(s): 749a177

Improve language tag (#1)

Browse files

- Improve language tag (857958591b57987976a2c9ec2b5eb6dd7b180d1d)


Co-authored-by: Loïck BOURDOIS <lbourdois@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +113 -101
README.md CHANGED
@@ -1,102 +1,114 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-1.5B-Instruct
3
- library_name: transformers
4
- model_name: Qwen2.5-1.5B-Thinking
5
- tags:
6
- - generated_from_trainer
7
- - trl
8
- - grpo
9
- licence: license
10
- datasets:
11
- - microsoft/orca-math-word-problems-200k
12
- model-index:
13
- - name: Qwen2.5-1.5B-Thinking
14
- results:
15
- - task:
16
- type: text-generation
17
- dataset:
18
- name: openai/gsm8k
19
- type: GradeSchoolMath8K
20
- metrics:
21
- - name: GSM8k (0-Shot)
22
- type: GSM8k (0-Shot)
23
- value: 14.4%
24
- - name: GSM8k (Few-Shot)
25
- type: GSM8k (Few-Shot)
26
- value: 63.31%
27
- co2_eq_emissions:
28
- emissions: 7100
29
- source: "https://mlco2.github.io/impact#compute"
30
- training_type: "GRPO"
31
- geographical_location: "East US2"
32
- hardware_used: "1 x H100 96GB"
33
-
34
- ---
35
-
36
- # Model Card for Qwen2.5-1.5B-Thinking
37
-
38
- Improved Model at [Qwen2.5-1.5B-Thinking-v1.1](https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking-v1.1).
39
- It has been trained using [TRL](https://github.com/huggingface/trl).
40
-
41
-
42
- ## Evals
43
-
44
- | Model | GSM8k 0-Shot | GSM8k Few-Shot |
45
- |------------------------------------------|------------------|-------------------|
46
- | Mistral-7B-v0.1 | 10 | 41 |
47
- | Qwen2.5-1.5B-Thinking | 14.4 | 63.31 |
48
-
49
-
50
- ## Training procedure
51
-
52
- <img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>
53
-
54
- <img src="https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking/resolve/main/w%26b_qwen_r1.png" width="1200" height="900"/>
55
-
56
- Trained on 1xH100 96GB via Azure Cloud (East US2).
57
-
58
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
59
-
60
- ### Usage Recommendations
61
-
62
- **Recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:**
63
-
64
- 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
65
- 2. **For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."**
66
- 3. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
67
- 4. This model is not enhanced for other domains apart from Maths.
68
-
69
- ### Framework versions
70
-
71
- - TRL: 0.15.0.dev0
72
- - Transformers: 4.49.0.dev0
73
- - Pytorch: 2.5.1
74
- - Datasets: 3.2.0
75
- - Tokenizers: 0.21.0
76
-
77
- ## Citations
78
-
79
- Cite GRPO as:
80
-
81
- ```bibtex
82
- @article{zhihong2024deepseekmath,
83
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
84
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
85
- year = 2024,
86
- eprint = {arXiv:2402.03300},
87
- }
88
-
89
- ```
90
-
91
- Cite TRL as:
92
-
93
- ```bibtex
94
- @misc{vonwerra2022trl,
95
- title = {{TRL: Transformer Reinforcement Learning}},
96
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
97
- year = 2020,
98
- journal = {GitHub repository},
99
- publisher = {GitHub},
100
- howpublished = {\url{https://github.com/huggingface/trl}}
101
- }
 
 
 
 
 
 
 
 
 
 
 
 
102
  ```
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
3
+ library_name: transformers
4
+ tags:
5
+ - generated_from_trainer
6
+ - trl
7
+ - grpo
8
+ licence: license
9
+ datasets:
10
+ - microsoft/orca-math-word-problems-200k
11
+ co2_eq_emissions:
12
+ emissions: 7100
13
+ source: https://mlco2.github.io/impact#compute
14
+ training_type: GRPO
15
+ geographical_location: East US2
16
+ hardware_used: 1 x H100 96GB
17
+ language:
18
+ - zho
19
+ - eng
20
+ - fra
21
+ - spa
22
+ - por
23
+ - deu
24
+ - ita
25
+ - rus
26
+ - jpn
27
+ - kor
28
+ - vie
29
+ - tha
30
+ - ara
31
+ model-index:
32
+ - name: Qwen2.5-1.5B-Thinking
33
+ results:
34
+ - task:
35
+ type: text-generation
36
+ dataset:
37
+ name: openai/gsm8k
38
+ type: GradeSchoolMath8K
39
+ metrics:
40
+ - type: GSM8k (0-Shot)
41
+ value: 14.4%
42
+ name: GSM8k (0-Shot)
43
+ - type: GSM8k (Few-Shot)
44
+ value: 63.31%
45
+ name: GSM8k (Few-Shot)
46
+ ---
47
+
48
+ # Model Card for Qwen2.5-1.5B-Thinking
49
+
50
+ Improved Model at [Qwen2.5-1.5B-Thinking-v1.1](https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking-v1.1).
51
+ It has been trained using [TRL](https://github.com/huggingface/trl).
52
+
53
+
54
+ ## Evals
55
+
56
+ | Model | GSM8k 0-Shot | GSM8k Few-Shot |
57
+ |------------------------------------------|------------------|-------------------|
58
+ | Mistral-7B-v0.1 | 10 | 41 |
59
+ | Qwen2.5-1.5B-Thinking | 14.4 | 63.31 |
60
+
61
+
62
+ ## Training procedure
63
+
64
+ <img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>
65
+
66
+ <img src="https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking/resolve/main/w%26b_qwen_r1.png" width="1200" height="900"/>
67
+
68
+ Trained on 1xH100 96GB via Azure Cloud (East US2).
69
+
70
+ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
71
+
72
+ ### Usage Recommendations
73
+
74
+ **Recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:**
75
+
76
+ 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
77
+ 2. **For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."**
78
+ 3. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
79
+ 4. This model is not enhanced for other domains apart from Maths.
80
+
81
+ ### Framework versions
82
+
83
+ - TRL: 0.15.0.dev0
84
+ - Transformers: 4.49.0.dev0
85
+ - Pytorch: 2.5.1
86
+ - Datasets: 3.2.0
87
+ - Tokenizers: 0.21.0
88
+
89
+ ## Citations
90
+
91
+ Cite GRPO as:
92
+
93
+ ```bibtex
94
+ @article{zhihong2024deepseekmath,
95
+ title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
96
+ author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
97
+ year = 2024,
98
+ eprint = {arXiv:2402.03300},
99
+ }
100
+
101
+ ```
102
+
103
+ Cite TRL as:
104
+
105
+ ```bibtex
106
+ @misc{vonwerra2022trl,
107
+ title = {{TRL: Transformer Reinforcement Learning}},
108
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
109
+ year = 2020,
110
+ journal = {GitHub repository},
111
+ publisher = {GitHub},
112
+ howpublished = {\url{https://github.com/huggingface/trl}}
113
+ }
114
  ```