ruijiewang45401's picture
Add model card
6f4f9b2 verified
---
language:
- en
- zh
license: apache-2.0
tags:
- qwen2.5
- dpo
- human-like
- conversational
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: transformers
---
# Qwen2.5-7B Human-like DPO
Fine-tuned [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) with DPO for more natural, empathetic conversations.
## πŸš€ Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"ruijiewang45401/qwen2.5-7b-humanlike-dpo",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ruijiewang45401/qwen2.5-7b-humanlike-dpo")
messages = [
{"role": "system", "content": "You are a warm and helpful assistant."},
{"role": "user", "content": "Hello!"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## πŸ“Š Training Details
- **Base Model**: Qwen2.5-7B-Instruct
- **Method**: DPO with LoRA (r=16, alpha=32)
- **Dataset**: [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
- **Training Samples**: 10,339
- **Hardware**: NVIDIA A100 40GB
- **Training Time**: ~1.5 hours
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| Beta | 0.1 |
| Learning Rate | 5e-5 |
| Batch Size | 2 Γ— 8 |
| Max Length | 1024 |
| Epochs | 1 |
| Optimizer | AdamW |
| Scheduler | Cosine |
### Results
- **Rewards/Margins**: 6.87
- **Accuracy**: 1.0
- **Val Loss**: ~0.00003
## πŸ’‘ Generation Parameters
```python
model.generate(
**inputs,
max_new_tokens=150-300,
temperature=0.7-0.9, # Higher = more natural
top_p=0.9,
repetition_penalty=1.1,
do_sample=True
)
```
## πŸ“œ Citation
```bibtex
@misc{qwen25-humanlike-dpo,
author = {ruijiewang45401},
title = {Qwen2.5-7B Human-like DPO},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/ruijiewang45401/qwen2.5-7b-humanlike-dpo}
}
```
## πŸ™ Credits
- [Qwen Team](https://huggingface.co/Qwen)
- [HumanLLMs Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
- [TRL Library](https://github.com/huggingface/trl)