ruijiewang45401
/

qwen2.5-7b-humanlike-dpo

Text Generation

text-generation-inference

Model card Files Files and versions

qwen2.5-7b-humanlike-dpo / README.md

ruijiewang45401's picture

ruijiewang45401

Add model card

6f4f9b2 verified 27 days ago

|

history blame contribute delete

2.44 kB

	---
	language:
	- en
	- zh
	license: apache-2.0
	tags:
	- qwen2.5
	- dpo
	- human-like
	- conversational
	base_model: Qwen/Qwen2.5-7B-Instruct
	library_name: transformers
	---

	# Qwen2.5-7B Human-like DPO

	Fine-tuned [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) with DPO for more natural, empathetic conversations.

	## 🚀 Quick Start
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"ruijiewang45401/qwen2.5-7b-humanlike-dpo",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("ruijiewang45401/qwen2.5-7b-humanlike-dpo")

	messages = [
	{"role": "system", "content": "You are a warm and helpful assistant."},
	{"role": "user", "content": "Hello!"}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to("cuda")

	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.8,
	top_p=0.9,
	do_sample=True
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## 📊 Training Details

	- Base Model: Qwen2.5-7B-Instruct
	- Method: DPO with LoRA (r=16, alpha=32)
	- Dataset: [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
	- Training Samples: 10,339
	- Hardware: NVIDIA A100 40GB
	- Training Time: ~1.5 hours

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Beta \| 0.1 \|
	\| Learning Rate \| 5e-5 \|
	\| Batch Size \| 2 × 8 \|
	\| Max Length \| 1024 \|
	\| Epochs \| 1 \|
	\| Optimizer \| AdamW \|
	\| Scheduler \| Cosine \|

	### Results

	- Rewards/Margins: 6.87
	- Accuracy: 1.0
	- Val Loss: ~0.00003

	## 💡 Generation Parameters
	```python
	model.generate(
	**inputs,
	max_new_tokens=150-300,
	temperature=0.7-0.9, # Higher = more natural
	top_p=0.9,
	repetition_penalty=1.1,
	do_sample=True
	)
	```

	## 📜 Citation
	```bibtex
	@misc{qwen25-humanlike-dpo,
	author = {ruijiewang45401},
	title = {Qwen2.5-7B Human-like DPO},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/ruijiewang45401/qwen2.5-7b-humanlike-dpo}
	}
	```

	## 🙏 Credits

	- [Qwen Team](https://huggingface.co/Qwen)
	- [HumanLLMs Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
	- [TRL Library](https://github.com/huggingface/trl)