JokeGPT

JokeGPT is a fine-tuned language model designed to generate humorous content. It is built upon the Qwen/Qwen3-8B architecture and trained using a three-stage process: Supervised Fine-Tuning (SFT), Reward Modeling, and Reinforcement Learning from Human Feedback (RLHF) via PPO.

Repository Structure

This repository contains the following models:

sft_final: The Supervised Fine-Tuned model. This model has been trained on a dataset of jokes to understand the structure and style of humorous text.
reward_model_final: The Reward Model. This model is trained to predict a "humor score" for a given text, used to guide the PPO training.
ppo_model: The final PPO-aligned model. This model uses the SFT model as a base and is further optimized using the Reward Model to maximize humor generation.

Usage

You can load these models using the transformers and peft libraries.

Loading the PPO Model (Recommended)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_name = "Qwen/Qwen3-8B"
adapter_path = "JokeGPT-Model/ppo_model"  # Path to the PPO adapter

# Load Base Model
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load Adapter
model = PeftModel.from_pretrained(model, adapter_path)

# Generate a Joke
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
prompt = "User: Tell me a joke about AI.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Pipeline

SFT: Fine-tuned on high-quality jokes (Reddit Jokes, Ruozhiba).
Reward Modeling: Trained on comparison data (humorous vs. non-humorous) to learn a reward function.
PPO: Optimized the SFT model against the Reward Model to encourage humorous outputs.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support