dada22231
/

9673be0a-6c60-4635-b850-f4bc6dd20a2f

@@ -1,169 +1,69 @@
 ---
-library_name: peft
-license: other
-base_model: Qwen/Qwen1.5-7B
 tags:
-- axolotl
 - generated_from_trainer
 - trl
 - grpo
-model-index:
-- name: 2296f73f-99bd-4e6b-95ca-b2cd4a1e78af
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.11.0.dev0`
-```yaml
-adapter: lora
-base_model: Qwen/Qwen1.5-7B
-bf16: true
-chat_template: llama3
-dataloader_num_workers: 0
-dataloader_pin_memory: false
-dataset_info: dcaeb768-0f9e-4c34-920e-d80288595c2a
-dataset_prepared_path: null
-datasets:
-- data_files:
-  - ecb638fa488a6a93_train_data.json
-  ds_type: json
-  format: custom
-  path: /workspace/input_data/
-  type:
-    field_instruction: instruct
-    field_output: output
-    format: '{instruction}'
-    no_input_format: '{instruction}'
-    system_format: '{system}'
-    system_prompt: ''
-ddp_broadcast_buffers: false
-ddp_bucket_cap_mb: 25
-ddp_timeout: 7200
-debug: null
-deepspeed: null
-evaluation_strategy: 'no'
-flash_attention: false
-flash_attn_cross_entropy: false
-flash_attn_rms_norm: false
-fp16: false
-fsdp: null
-fsdp_config: null
-gpu_memory_limit: null
-gradient_accumulation_steps: 4
-gradient_checkpointing: true
-gradient_checkpointing_kwargs:
-  use_reentrant: false
-group_by_length: false
-hub_model_commit_message: Training checkpoint - step {current_step}
-hub_model_id: dada22231/2296f73f-99bd-4e6b-95ca-b2cd4a1e78af
-hub_model_revision: main
-hub_repo: null
-hub_strategy: checkpoint
-hub_token: null
-learning_rate: 0.0002
-load_in_4bit: false
-load_in_8bit: false
-local_rank: null
-logging_steps: 1
-lora_alpha: 256
-lora_dropout: 0.05
-lora_fan_in_fan_out: null
-lora_model_dir: null
-lora_modules_to_save:
-- embed_tokens
-- lm_head
-lora_r: 128
-lora_target_linear: true
-lr_scheduler: constant_with_warmup
-max_memory: null
-max_steps: 1500
-micro_batch_size: 8
-mlflow_experiment_name: /tmp/ecb638fa488a6a93_train_data.json
-model_type: AutoModelForCausalLM
-optimizer: adamw_torch_fused
-output_dir: ./outputs
-pad_to_sequence_len: true
-push_dataset_card: false
-push_to_hub: true
-resume_from_checkpoint: null
-s2_attention: null
-sample_packing: true
-save_lora_adapter: false
-save_merged_lora_model: true
-save_only_model: true
-save_safetensors: true
-save_steps: 75
-save_strategy: steps
-save_total_limit: 5
-sequence_len: 4096
-special_tokens: null
-strict: false
-tf32: true
-tokenizer_type: AutoTokenizer
-torch_compile: false
-torch_compile_backend: inductor
-train_on_inputs: false
-trust_remote_code: true
-val_set_size: 0
-wandb_entity: null
-wandb_mode: online
-wandb_name: dcaeb768-0f9e-4c34-920e-d80288595c2a
-wandb_project: Gradients-On-Demand
-wandb_run: your_name
-wandb_runid: dcaeb768-0f9e-4c34-920e-d80288595c2a
-warmup_steps: 150
-weight_decay: 0.01
-xformers_attention: null
-```
-</details><br>
-# 2296f73f-99bd-4e6b-95ca-b2cd4a1e78af
-This model is a fine-tuned version of [Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 32
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: constant_with_warmup
-- lr_scheduler_warmup_steps: 150
-- training_steps: 1500
-### Training results
-### Framework versions
-- PEFT 0.15.2
-- Transformers 4.52.4
-- Pytorch 2.7.1+cu128
-- Datasets 3.6.0
-- Tokenizers 0.21.1

 ---
+base_model: unsloth/SmolLM-1.7B-Instruct
+library_name: transformers
+model_name: 9673be0a-6c60-4635-b850-f4bc6dd20a2f
 tags:
 - generated_from_trainer
+- axolotl
 - trl
 - grpo
+licence: license
 ---
+# Model Card for 9673be0a-6c60-4635-b850-f4bc6dd20a2f
+This model is a fine-tuned version of [unsloth/SmolLM-1.7B-Instruct](https://huggingface.co/unsloth/SmolLM-1.7B-Instruct).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="dada22231/9673be0a-6c60-4635-b850-f4bc6dd20a2f", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/zamespol1-hugging-face/Gradients-On-Demand/runs/33u5qjqy)
+This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
+### Framework versions
+- TRL: 0.18.2
+- Transformers: 4.52.4
+- Pytorch: 2.7.1+cu128
+- Datasets: 3.6.0
+- Tokenizers: 0.21.1
+## Citations
+Cite GRPO as:
+```bibtex
+@article{zhihong2024deepseekmath,
+    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
+    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
+    year         = 2024,
+    eprint       = {arXiv:2402.03300},
+}
+```
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```