Enhanced QLoRA Adapter for Phi-3-mini: A Technical SQL Assistant (2 Epochs)
This repository contains an improved, high-performance QLoRA adapter for the microsoft/Phi-3-mini-4k-instruct model.
This version has been fine-tuned for two full epochs on a Text-to-SQL task, resulting in enhanced performance and reliability compared to single-epoch versions.
The model is designed to function as a technical assistant, capable of generating accurate SQL queries from natural language questions based on a provided database schema. This project was developed for an engineering and deployment course, with a focus on creating a robust, reproducible, and practical AI artifact.
Key Improvements in This Version
- Enhanced Reliability: Training for two epochs has significantly improved the model's ability to consistently adhere to the required chat template format, reducing parsing errors in production.
- Maintained Accuracy: The model maintains its high accuracy in generating syntactically correct and logically sound SQL queries.
- Robust Loading: The usage instructions below follow best practices to ensure reliable loading across different environments.
How to Use
First, ensure you have a compatible environment by installing these specific library versions:
pip install transformers==4.38.2 peft==0.10.0 accelerate==0.28.0 bitsandbytes==0.43.0 torch
The following code provides the most robust method for loading and running inference with this adapter.
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
# --- 1. Configuration ---
base_model_id = "microsoft/Phi-3-mini-4k-instruct"
# IMPORTANT: Replace with your new model's ID on the Hugging Face Hub
adapter_id = "YourUsername/YourNewModelName"
# --- 2. Load the Quantized Base Model ---
# This is required to fit the model in memory-constrained environments like Colab
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
# --- 3. Load and Apply the LoRA Adapter ---
model = PeftModel.from_pretrained(base_model, adapter_id)
print("Successfully loaded quantized base model and applied adapter.")
# --- 4. Prepare for Inference ---
context = "CREATE TABLE employees (name VARCHAR, department VARCHAR, salary INTEGER)"
question = "What are the names of employees in the 'Engineering' department with a salary over 80000?"
prompt = f"""<|user|>
Given the database schema:
{context}
Generate the SQL query for the following request:
{question}<|end|>
<|assistant|>
"""
# --- 5. Generate the Response ---
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=False)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
generated_sql = generated_text.split("<|assistant|>")[-1].strip()
print(f"\nGenerated SQL: {generated_sql}")
# Expected output: SELECT name FROM employees WHERE department = 'Engineering' AND salary > 80000
Training Procedure
Dataset
The model was fine-tuned on a 10,000-sample subset of the b-mc2/sql-create-context dataset, split 90/10 for training and validation.
Fine-tuning Configuration (QLoRA)
- Quantization: 4-bit NormalFloat (NF4) with
bfloat16compute dtype. - LoRA Rank (
r): 8 - LoRA Alpha (
lora_alpha): 16 - Target Modules: All linear layers in the Phi-3 architecture (
q_proj,k_proj,v_proj,o_proj, etc.).
Training Hyperparameters
- Learning Rate: 2e-4
- Epochs: 2
- Effective Batch Size: 8
- Optimizer: Paged AdamW (32-bit)
- LR Scheduler: Cosine
Evaluation and Results
Qualitative evaluation on a held-out test set confirms that the model consistently generates correct SQL queries.
The extended training to two epochs has successfully addressed the primary limitation of the single-epoch version: inconsistent formatting.
This model now reliably generates the <|assistant|> token, making it more suitable for automated parsing and deployment.
Deployment & Optimization Considerations
For deployment in a production environment, consider the following optimizations:
- Merge Adapter Weights: Before deploying, merge the adapter weights into the base model to create a single, solid model. This eliminates the overhead of dynamically applying the adapter during inference and can improve performance.
# After loading the model and adapter:
merged_model = model.merge_and_unload()
# Use 'merged_model' for all subsequent 'generate' calls.
- Further Quantization: For CPU-based deployment or even more efficient GPU usage, the merged model can be further quantized into formats like GGUF (for use with
llama.cpp) or AWQ/GPTQ. - API Serving: Wrap the model in a high-performance web server like FastAPI or use a dedicated LLM serving framework like vLLM for optimal throughput and batching.
Limitations and Responsible AI
- Generalization: The model is specialized for the Text-to-SQL task and the schema styles seen in its training data. It may not perform well on highly complex or esoteric SQL dialects.
- Security: This model is a proof-of-concept and has not been hardened against SQL injection attacks. All generated SQL should be treated as untrusted input and must be sanitized or executed in a sandboxed, read-only environment.
- Bias: The training data is the source of the model's knowledge. Any biases present in the sql-create-context dataset may be reflected in the model's outputs.
- Downloads last month
- 56
Model tree for manuelaschrittwieser/phi-3-mini-sql-assistant
Base model
microsoft/Phi-3-mini-4k-instruct