Qwen2.5-7B Human-like DPO
Fine-tuned Qwen2.5-7B-Instruct with DPO for more natural, empathetic conversations.
π Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"ruijiewang45401/qwen2.5-7b-humanlike-dpo",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ruijiewang45401/qwen2.5-7b-humanlike-dpo")
messages = [
{"role": "system", "content": "You are a warm and helpful assistant."},
{"role": "user", "content": "Hello!"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Training Details
- Base Model: Qwen2.5-7B-Instruct
- Method: DPO with LoRA (r=16, alpha=32)
- Dataset: HumanLLMs/Human-Like-DPO-Dataset
- Training Samples: 10,339
- Hardware: NVIDIA A100 40GB
- Training Time: ~1.5 hours
Hyperparameters
| Parameter | Value |
|---|---|
| Beta | 0.1 |
| Learning Rate | 5e-5 |
| Batch Size | 2 Γ 8 |
| Max Length | 1024 |
| Epochs | 1 |
| Optimizer | AdamW |
| Scheduler | Cosine |
Results
- Rewards/Margins: 6.87
- Accuracy: 1.0
- Val Loss: ~0.00003
π‘ Generation Parameters
model.generate(
**inputs,
max_new_tokens=150-300,
temperature=0.7-0.9, # Higher = more natural
top_p=0.9,
repetition_penalty=1.1,
do_sample=True
)
π Citation
@misc{qwen25-humanlike-dpo,
author = {ruijiewang45401},
title = {Qwen2.5-7B Human-like DPO},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/ruijiewang45401/qwen2.5-7b-humanlike-dpo}
}
π Credits
- Downloads last month
- 61