dada22231 commited on
Commit
bc1f537
·
verified ·
1 Parent(s): b63bb93

End of training

Browse files
Files changed (1) hide show
  1. README.md +47 -147
README.md CHANGED
@@ -1,169 +1,69 @@
1
  ---
2
- library_name: peft
3
- license: other
4
- base_model: Qwen/Qwen1.5-7B
5
  tags:
6
- - axolotl
7
  - generated_from_trainer
 
8
  - trl
9
  - grpo
10
- model-index:
11
- - name: 2296f73f-99bd-4e6b-95ca-b2cd4a1e78af
12
- results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
- <details><summary>See axolotl config</summary>
20
-
21
- axolotl version: `0.11.0.dev0`
22
- ```yaml
23
- adapter: lora
24
- base_model: Qwen/Qwen1.5-7B
25
- bf16: true
26
- chat_template: llama3
27
- dataloader_num_workers: 0
28
- dataloader_pin_memory: false
29
- dataset_info: dcaeb768-0f9e-4c34-920e-d80288595c2a
30
- dataset_prepared_path: null
31
- datasets:
32
- - data_files:
33
- - ecb638fa488a6a93_train_data.json
34
- ds_type: json
35
- format: custom
36
- path: /workspace/input_data/
37
- type:
38
- field_instruction: instruct
39
- field_output: output
40
- format: '{instruction}'
41
- no_input_format: '{instruction}'
42
- system_format: '{system}'
43
- system_prompt: ''
44
- ddp_broadcast_buffers: false
45
- ddp_bucket_cap_mb: 25
46
- ddp_timeout: 7200
47
- debug: null
48
- deepspeed: null
49
- evaluation_strategy: 'no'
50
- flash_attention: false
51
- flash_attn_cross_entropy: false
52
- flash_attn_rms_norm: false
53
- fp16: false
54
- fsdp: null
55
- fsdp_config: null
56
- gpu_memory_limit: null
57
- gradient_accumulation_steps: 4
58
- gradient_checkpointing: true
59
- gradient_checkpointing_kwargs:
60
- use_reentrant: false
61
- group_by_length: false
62
- hub_model_commit_message: Training checkpoint - step {current_step}
63
- hub_model_id: dada22231/2296f73f-99bd-4e6b-95ca-b2cd4a1e78af
64
- hub_model_revision: main
65
- hub_repo: null
66
- hub_strategy: checkpoint
67
- hub_token: null
68
- learning_rate: 0.0002
69
- load_in_4bit: false
70
- load_in_8bit: false
71
- local_rank: null
72
- logging_steps: 1
73
- lora_alpha: 256
74
- lora_dropout: 0.05
75
- lora_fan_in_fan_out: null
76
- lora_model_dir: null
77
- lora_modules_to_save:
78
- - embed_tokens
79
- - lm_head
80
- lora_r: 128
81
- lora_target_linear: true
82
- lr_scheduler: constant_with_warmup
83
- max_memory: null
84
- max_steps: 1500
85
- micro_batch_size: 8
86
- mlflow_experiment_name: /tmp/ecb638fa488a6a93_train_data.json
87
- model_type: AutoModelForCausalLM
88
- optimizer: adamw_torch_fused
89
- output_dir: ./outputs
90
- pad_to_sequence_len: true
91
- push_dataset_card: false
92
- push_to_hub: true
93
- resume_from_checkpoint: null
94
- s2_attention: null
95
- sample_packing: true
96
- save_lora_adapter: false
97
- save_merged_lora_model: true
98
- save_only_model: true
99
- save_safetensors: true
100
- save_steps: 75
101
- save_strategy: steps
102
- save_total_limit: 5
103
- sequence_len: 4096
104
- special_tokens: null
105
- strict: false
106
- tf32: true
107
- tokenizer_type: AutoTokenizer
108
- torch_compile: false
109
- torch_compile_backend: inductor
110
- train_on_inputs: false
111
- trust_remote_code: true
112
- val_set_size: 0
113
- wandb_entity: null
114
- wandb_mode: online
115
- wandb_name: dcaeb768-0f9e-4c34-920e-d80288595c2a
116
- wandb_project: Gradients-On-Demand
117
- wandb_run: your_name
118
- wandb_runid: dcaeb768-0f9e-4c34-920e-d80288595c2a
119
- warmup_steps: 150
120
- weight_decay: 0.01
121
- xformers_attention: null
122
-
123
- ```
124
-
125
- </details><br>
126
-
127
- # 2296f73f-99bd-4e6b-95ca-b2cd4a1e78af
128
 
129
- This model is a fine-tuned version of [Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B) on an unknown dataset.
 
130
 
131
- ## Model description
132
 
133
- More information needed
 
134
 
135
- ## Intended uses & limitations
 
 
 
 
136
 
137
- More information needed
138
 
139
- ## Training and evaluation data
140
 
141
- More information needed
142
 
143
- ## Training procedure
144
 
145
- ### Training hyperparameters
146
 
147
- The following hyperparameters were used during training:
148
- - learning_rate: 0.0002
149
- - train_batch_size: 8
150
- - eval_batch_size: 8
151
- - seed: 42
152
- - gradient_accumulation_steps: 4
153
- - total_train_batch_size: 32
154
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
155
- - lr_scheduler_type: constant_with_warmup
156
- - lr_scheduler_warmup_steps: 150
157
- - training_steps: 1500
158
 
159
- ### Training results
160
 
 
161
 
 
 
 
 
 
 
 
162
 
163
- ### Framework versions
164
 
165
- - PEFT 0.15.2
166
- - Transformers 4.52.4
167
- - Pytorch 2.7.1+cu128
168
- - Datasets 3.6.0
169
- - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: unsloth/SmolLM-1.7B-Instruct
3
+ library_name: transformers
4
+ model_name: 9673be0a-6c60-4635-b850-f4bc6dd20a2f
5
  tags:
 
6
  - generated_from_trainer
7
+ - axolotl
8
  - trl
9
  - grpo
10
+ licence: license
 
 
11
  ---
12
 
13
+ # Model Card for 9673be0a-6c60-4635-b850-f4bc6dd20a2f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ This model is a fine-tuned version of [unsloth/SmolLM-1.7B-Instruct](https://huggingface.co/unsloth/SmolLM-1.7B-Instruct).
16
+ It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
+ ## Quick start
19
 
20
+ ```python
21
+ from transformers import pipeline
22
 
23
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
+ generator = pipeline("text-generation", model="dada22231/9673be0a-6c60-4635-b850-f4bc6dd20a2f", device="cuda")
25
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
+ print(output["generated_text"])
27
+ ```
28
 
29
+ ## Training procedure
30
 
31
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/zamespol1-hugging-face/Gradients-On-Demand/runs/33u5qjqy)
32
 
 
33
 
34
+ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
35
 
36
+ ### Framework versions
37
 
38
+ - TRL: 0.18.2
39
+ - Transformers: 4.52.4
40
+ - Pytorch: 2.7.1+cu128
41
+ - Datasets: 3.6.0
42
+ - Tokenizers: 0.21.1
 
 
 
 
 
 
43
 
44
+ ## Citations
45
 
46
+ Cite GRPO as:
47
 
48
+ ```bibtex
49
+ @article{zhihong2024deepseekmath,
50
+ title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
51
+ author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
52
+ year = 2024,
53
+ eprint = {arXiv:2402.03300},
54
+ }
55
 
56
+ ```
57
 
58
+ Cite TRL as:
59
+
60
+ ```bibtex
61
+ @misc{vonwerra2022trl,
62
+ title = {{TRL: Transformer Reinforcement Learning}},
63
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
64
+ year = 2020,
65
+ journal = {GitHub repository},
66
+ publisher = {GitHub},
67
+ howpublished = {\url{https://github.com/huggingface/trl}}
68
+ }
69
+ ```