|
|
--- |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- qwen2.5 |
|
|
- dpo |
|
|
- human-like |
|
|
- conversational |
|
|
base_model: Qwen/Qwen2.5-7B-Instruct |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Qwen2.5-7B Human-like DPO |
|
|
|
|
|
Fine-tuned [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) with DPO for more natural, empathetic conversations. |
|
|
|
|
|
## π Quick Start |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"ruijiewang45401/qwen2.5-7b-humanlike-dpo", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained("ruijiewang45401/qwen2.5-7b-humanlike-dpo") |
|
|
|
|
|
messages = [ |
|
|
{"role": "system", "content": "You are a warm and helpful assistant."}, |
|
|
{"role": "user", "content": "Hello!"} |
|
|
] |
|
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(text, return_tensors="pt").to("cuda") |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=200, |
|
|
temperature=0.8, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
- **Base Model**: Qwen2.5-7B-Instruct |
|
|
- **Method**: DPO with LoRA (r=16, alpha=32) |
|
|
- **Dataset**: [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) |
|
|
- **Training Samples**: 10,339 |
|
|
- **Hardware**: NVIDIA A100 40GB |
|
|
- **Training Time**: ~1.5 hours |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Beta | 0.1 | |
|
|
| Learning Rate | 5e-5 | |
|
|
| Batch Size | 2 Γ 8 | |
|
|
| Max Length | 1024 | |
|
|
| Epochs | 1 | |
|
|
| Optimizer | AdamW | |
|
|
| Scheduler | Cosine | |
|
|
|
|
|
### Results |
|
|
|
|
|
- **Rewards/Margins**: 6.87 |
|
|
- **Accuracy**: 1.0 |
|
|
- **Val Loss**: ~0.00003 |
|
|
|
|
|
## π‘ Generation Parameters |
|
|
```python |
|
|
model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=150-300, |
|
|
temperature=0.7-0.9, # Higher = more natural |
|
|
top_p=0.9, |
|
|
repetition_penalty=1.1, |
|
|
do_sample=True |
|
|
) |
|
|
``` |
|
|
|
|
|
## π Citation |
|
|
```bibtex |
|
|
@misc{qwen25-humanlike-dpo, |
|
|
author = {ruijiewang45401}, |
|
|
title = {Qwen2.5-7B Human-like DPO}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/ruijiewang45401/qwen2.5-7b-humanlike-dpo} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π Credits |
|
|
|
|
|
- [Qwen Team](https://huggingface.co/Qwen) |
|
|
- [HumanLLMs Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) |
|
|
- [TRL Library](https://github.com/huggingface/trl) |
|
|
|