PPOpt-Llama-3.1-8B-Instruct-LoRA

A LoRA adapter for Llama-3.1-8B-Instruct fine-tuned for Prompt Optimization task.

Model Description

This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.

Training Pipeline

  • Stage 1: SFT (Supervised Fine-Tuning) - Trained on curated prompt optimization examples
  • Stage 2: GRPO (Group Relative Policy Optimization) - Reinforcement learning with GPT-4o-mini as judge

LoRA Configuration

Parameter Value
r (rank) 32
lora_alpha 32
target_modules all-linear
lora_dropout 0
bias none

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")

Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load model
base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")

# Prepare input
conversation_history = """User: How do I center a div?
Assistant: You can use flexbox: display: flex; justify-content: center; align-items: center;
User: What about grid?
Assistant: With grid: display: grid; place-items: center;"""

current_query = "how to make it responsive"

prompt = f"""Based on the conversation history and user preferences, optimize the following query into a clearer, more specific prompt.

Conversation History:
{conversation_history}

Current Query: {current_query}

Optimized Prompt:"""

# Generate
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Merge LoRA (Optional)

If you want to merge the adapter into the base model:

merged_model = model.merge_and_unload()
merged_model.save_pretrained("merged_ppopt_llama8b")
tokenizer.save_pretrained("merged_ppopt_llama8b")

Intended Use

This model is designed for:

  • Prompt optimization/rewriting systems
  • Personalized query enhancement based on user history
  • Research on prompt engineering automation

License

This model is released under the Apache 2.0 license.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HowieHwong/ppopt

Adapter
(1485)
this model