Enhanced QLoRA Adapter for Phi-3-mini: A Technical SQL Assistant (2 Epochs)

This repository contains an improved, high-performance QLoRA adapter for the microsoft/Phi-3-mini-4k-instruct model. This version has been fine-tuned for two full epochs on a Text-to-SQL task, resulting in enhanced performance and reliability compared to single-epoch versions.

The model is designed to function as a technical assistant, capable of generating accurate SQL queries from natural language questions based on a provided database schema. This project was developed for an engineering and deployment course, with a focus on creating a robust, reproducible, and practical AI artifact.

Key Improvements in This Version

  • Enhanced Reliability: Training for two epochs has significantly improved the model's ability to consistently adhere to the required chat template format, reducing parsing errors in production.
  • Maintained Accuracy: The model maintains its high accuracy in generating syntactically correct and logically sound SQL queries.
  • Robust Loading: The usage instructions below follow best practices to ensure reliable loading across different environments.

How to Use

First, ensure you have a compatible environment by installing these specific library versions:

pip install transformers==4.38.2 peft==0.10.0 accelerate==0.28.0 bitsandbytes==0.43.0 torch

The following code provides the most robust method for loading and running inference with this adapter.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# --- 1. Configuration ---
base_model_id = "microsoft/Phi-3-mini-4k-instruct"
# IMPORTANT: Replace with your new model's ID on the Hugging Face Hub
adapter_id = "YourUsername/YourNewModelName" 

# --- 2. Load the Quantized Base Model ---
# This is required to fit the model in memory-constrained environments like Colab
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# --- 3. Load and Apply the LoRA Adapter ---
model = PeftModel.from_pretrained(base_model, adapter_id)
print("Successfully loaded quantized base model and applied adapter.")

# --- 4. Prepare for Inference ---
context = "CREATE TABLE employees (name VARCHAR, department VARCHAR, salary INTEGER)"
question = "What are the names of employees in the 'Engineering' department with a salary over 80000?"

prompt = f"""<|user|>
Given the database schema:
{context}

Generate the SQL query for the following request:
{question}<|end|>
<|assistant|>
"""

# --- 5. Generate the Response ---
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=False)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
generated_sql = generated_text.split("<|assistant|>")[-1].strip()

print(f"\nGenerated SQL: {generated_sql}")
# Expected output: SELECT name FROM employees WHERE department = 'Engineering' AND salary > 80000

Training Procedure

Dataset

The model was fine-tuned on a 10,000-sample subset of the b-mc2/sql-create-context dataset, split 90/10 for training and validation.

Fine-tuning Configuration (QLoRA)

  • Quantization: 4-bit NormalFloat (NF4) with bfloat16 compute dtype.
  • LoRA Rank (r): 8
  • LoRA Alpha (lora_alpha): 16
  • Target Modules: All linear layers in the Phi-3 architecture (q_proj, k_proj, v_proj, o_proj, etc.).

Training Hyperparameters

  • Learning Rate: 2e-4
  • Epochs: 2
  • Effective Batch Size: 8
  • Optimizer: Paged AdamW (32-bit)
  • LR Scheduler: Cosine

Evaluation and Results

Qualitative evaluation on a held-out test set confirms that the model consistently generates correct SQL queries. The extended training to two epochs has successfully addressed the primary limitation of the single-epoch version: inconsistent formatting. This model now reliably generates the <|assistant|> token, making it more suitable for automated parsing and deployment.

Deployment & Optimization Considerations

For deployment in a production environment, consider the following optimizations:

  1. Merge Adapter Weights: Before deploying, merge the adapter weights into the base model to create a single, solid model. This eliminates the overhead of dynamically applying the adapter during inference and can improve performance.
# After loading the model and adapter:
merged_model = model.merge_and_unload()
# Use 'merged_model' for all subsequent 'generate' calls.
  1. Further Quantization: For CPU-based deployment or even more efficient GPU usage, the merged model can be further quantized into formats like GGUF (for use with llama.cpp) or AWQ/GPTQ.
  2. API Serving: Wrap the model in a high-performance web server like FastAPI or use a dedicated LLM serving framework like vLLM for optimal throughput and batching.

Limitations and Responsible AI

  • Generalization: The model is specialized for the Text-to-SQL task and the schema styles seen in its training data. It may not perform well on highly complex or esoteric SQL dialects.
  • Security: This model is a proof-of-concept and has not been hardened against SQL injection attacks. All generated SQL should be treated as untrusted input and must be sanitized or executed in a sandboxed, read-only environment.
  • Bias: The training data is the source of the model's knowledge. Any biases present in the sql-create-context dataset may be reflected in the model's outputs.
Downloads last month
56
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for manuelaschrittwieser/phi-3-mini-sql-assistant

Adapter
(773)
this model

Dataset used to train manuelaschrittwieser/phi-3-mini-sql-assistant