--- library_name: transformers tags: - unsloth --- ![image](https://cdn-uploads.huggingface.co/production/uploads/64739bc371f07ae738d2d61d/CmtHh8KBVf2U2TfGGl1y5.png) ``` training_args = GRPOConfig( vllm_sampling_params = vllm_sampling_params, # max_grad_norm = 0.1, # beta = 0.001, temperature = 1.0, learning_rate = 1e-5, weight_decay = 0.01, warmup_ratio = 0.01, lr_scheduler_type = "linear", optim = "adamw_8bit", logging_steps = 1, per_device_train_batch_size = 4, gradient_accumulation_steps = 1, num_generations = 4, # steps_per_generation = 16, max_prompt_length = max_prompt_length, max_completion_length = max_completion_length, # num_train_epochs = 1, # Set to 1 for a full training run max_steps = 5000, save_steps = 250, save_total_limit = 10, report_to = "wandb", output_dir = "outputs", ) ```