Qwen2.5-7B Human-like DPO

Fine-tuned Qwen2.5-7B-Instruct with DPO for more natural, empathetic conversations.

πŸš€ Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "ruijiewang45401/qwen2.5-7b-humanlike-dpo",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ruijiewang45401/qwen2.5-7b-humanlike-dpo")

messages = [
    {"role": "system", "content": "You are a warm and helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ“Š Training Details

  • Base Model: Qwen2.5-7B-Instruct
  • Method: DPO with LoRA (r=16, alpha=32)
  • Dataset: HumanLLMs/Human-Like-DPO-Dataset
  • Training Samples: 10,339
  • Hardware: NVIDIA A100 40GB
  • Training Time: ~1.5 hours

Hyperparameters

Parameter Value
Beta 0.1
Learning Rate 5e-5
Batch Size 2 Γ— 8
Max Length 1024
Epochs 1
Optimizer AdamW
Scheduler Cosine

Results

  • Rewards/Margins: 6.87
  • Accuracy: 1.0
  • Val Loss: ~0.00003

πŸ’‘ Generation Parameters

model.generate(
    **inputs,
    max_new_tokens=150-300,
    temperature=0.7-0.9,     # Higher = more natural
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True
)

πŸ“œ Citation

@misc{qwen25-humanlike-dpo,
  author = {ruijiewang45401},
  title = {Qwen2.5-7B Human-like DPO},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/ruijiewang45401/qwen2.5-7b-humanlike-dpo}
}

πŸ™ Credits

Downloads last month
61
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ruijiewang45401/qwen2.5-7b-humanlike-dpo

Base model

Qwen/Qwen2.5-7B
Finetuned
(2148)
this model