training_args = GRPOConfig(
    vllm_sampling_params = vllm_sampling_params,
    # max_grad_norm = 0.1,
    # beta = 0.001,
    temperature = 1.0,
    learning_rate = 1e-5,
    weight_decay = 0.01,
    warmup_ratio = 0.01,
    lr_scheduler_type = "linear",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 12,
    gradient_accumulation_steps = 1, 
    num_generations = 4,
    # steps_per_generation = 16,
    max_prompt_length = max_prompt_length,
    max_completion_length = max_completion_length,
    # num_train_epochs = 1, # Set to 1 for a full training run
    max_steps = 1600,
    save_steps = 250,
    save_total_limit = 10,
    report_to = "wandb",
    output_dir = "outputs",
)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support