Qwen2.5-7B Human-like DPO

Fine-tuned Qwen2.5-7B-Instruct with DPO for more natural, empathetic conversations.

🚀 Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "ruijiewang45401/qwen2.5-7b-humanlike-dpo",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ruijiewang45401/qwen2.5-7b-humanlike-dpo")

messages = [
    {"role": "system", "content": "You are a warm and helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 Training Details

Base Model: Qwen2.5-7B-Instruct
Method: DPO with LoRA (r=16, alpha=32)
Dataset: HumanLLMs/Human-Like-DPO-Dataset
Training Samples: 10,339
Hardware: NVIDIA A100 40GB
Training Time: ~1.5 hours

Hyperparameters

Parameter	Value
Beta	0.1
Learning Rate	5e-5
Batch Size	2 × 8
Max Length	1024
Epochs	1
Optimizer	AdamW
Scheduler	Cosine

Results

Rewards/Margins: 6.87
Accuracy: 1.0
Val Loss: ~0.00003

💡 Generation Parameters

model.generate(
    **inputs,
    max_new_tokens=150-300,
    temperature=0.7-0.9,     # Higher = more natural
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True
)

📜 Citation

@misc{qwen25-humanlike-dpo,
  author = {ruijiewang45401},
  title = {Qwen2.5-7B Human-like DPO},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/ruijiewang45401/qwen2.5-7b-humanlike-dpo}
}

🙏 Credits

Downloads last month: 61

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ruijiewang45401/qwen2.5-7b-humanlike-dpo

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2148)

this model