HistoryGPT

Vietnamese History AI Assistant fine-tuned with RLHF (PPO).

Training Details

  • Base Model: khanhrill/HistoryGPT
  • Fine-tuning: PPO with human feedback from OpenWebUI
  • Last Updated: 2025-12-12
  • Version: 20251212_0806

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("khanhrill/HistoryGPT")
tokenizer = AutoTokenizer.from_pretrained("khanhrill/HistoryGPT")

prompt = "Hãy kể về lịch sử Việt Nam"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))

Training Pipeline

This model was trained using an automated RLHF pipeline:

  1. Collect user feedback from OpenWebUI
  2. Train reward model from preference pairs
  3. Fine-tune with PPO using the reward model
  4. Deploy to HuggingFace Hub
Downloads last month
177
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for khanhrill/HistoryGPT

Adapters
1 model
Quantizations
1 model