--- license: apache-2.0 base_model: Qwen/Qwen2.5-1.5B-Instruct datasets: - open-r1/DAPO-Math-17k-Processed library_name: transformers tags: - grpo - reinforcement-learning - reasoning - qwen --- # Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_882_FlashRL_G4-L2048_new This repository contains a checkpoint trained with GRPO on `open-r1/DAPO-Math-17k-Processed` starting from `Qwen/Qwen2.5-1.5B-Instruct`.\ This snapshot corresponds to training step `882`. Contents include: - Model weights (`.safetensors`) - Config files (`config.json`, `generation_config.json`) - Tokenizer files (`tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, `special_tokens_map.json`, `added_tokens.json`) - Optional chat template (`chat_template.jinja`) Training artifacts (optimizer/scheduler states and RNG) have been intentionally excluded.