Mistral-7B-text-to-sql-flash-attention-2-dataeval
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 0.4605
Perplexity of 10.40
Perplexity: Perplexity is a measure of how uncertain or surprised the model is about its predictions. It's derived from the probabilities the model assigns to different words or tokens.
Perplexity Article: https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf https://medium.com/@AyushmanPranav/perplexity-calculation-in-nlp-0699fbda4594
The perplexity of 10.40 achieved on the dataset indicates that the fine-tuned Mistral-7B model reasonably understands natural language and SQL syntax. However, further evaluation using task-specific metrics is necessary to assess the model's effectiveness in real-world scenarios. By combining quantitative metrics like perplexity with qualitative analysis of generated queries, we can comprehensively understand the model's strengths and weaknesses, ultimately leading to improved performance and more reliable text-to-SQL translation capabilities.
Dataset : b-mc2/sql-create-context
Model description
Article: https://medium.com/@frankmorales_91352/fine-tuning-the-llm-mistral-7b-instruct-v0-3-249c1814ceaf
Training and evaluation data
Fine Tuning and Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_SQL_EVALDATA.ipynb
Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/Evaluator_Mistral_7B_text_to_sql.ipynb
Evaluation article with Chromadb: https://medium.com/@frankmorales_91352/a-comprehensive-evaluation-of-a-fine-tuned-text-to-sql-model-from-code-to-results-with-7ea59943b0a1
Evaluation article with Chromadb, PostgreSQL and the โgretelai/synthetic_text_to_sqlโ dataset: https://medium.com/@frankmorales_91352/evaluating-the-performance-of-a-fine-tuned-text-to-sql-model-6b7d61dcfef5 The article discusses evaluating this fine-tuned text-to-SQL model, a type of artificial intelligence that translates natural language into SQL queries.
The model was trained on the "b-mc2/sql-create-context" dataset and evaluated using the "gretelai/synthetic_text_to_sql" dataset.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.03
- lr_scheduler_warmup_steps: 15
- num_epochs: 3
from transformers import TrainingArguments
args = TrainingArguments( output_dir="Mistral-7B-text-to-sql-flash-attention-2-dataeval",
num_train_epochs=3, # number of training epochs
per_device_train_batch_size=3, # batch size per device during training
gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
gradient_checkpointing=True, # use gradient checkpointing to save memory
optim="adamw_torch_fused", # use fused adamw optimizer
logging_steps=10, # log every ten steps
#save_strategy="epoch", # save checkpoint every epoch
learning_rate=2e-4, # learning rate, based on QLoRA paper
bf16=True, # use bfloat16 precision
tf32=True, # use tf32 precision
max_grad_norm=0.3, # max gradient norm based on QLoRA paper
warmup_ratio=0.03, # warmup ratio based on QLoRA paper
weight_decay=0.01,
lr_scheduler_type="constant", # use constant learning rate scheduler
push_to_hub=True, # push model to hub
report_to="tensorboard", # report metrics to tensorboard
hub_token=access_token_write, # Add this line
load_best_model_at_end=True,
logging_dir="/content/drive/MyDrive/model/Mistral-7B-text-to-sql-flash-attention-2-dataeval/logs",
evaluation_strategy="steps",
eval_steps=10,
save_strategy="steps",
save_steps=10,
metric_for_best_model = "loss",
warmup_steps=15,
)
H2E Safety Tool with ChromaDB Expert Retrieval
import torch
import numpy as np
import chromadb
import sqlite3
import re
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from sklearn.metrics.pairwise import cosine_similarity
from datasets import load_dataset
# 1. INITIALIZATION: Mistral-7B + LoRA Expert DNA
base_model_id = "mistralai/Mistral-7B-Instruct-v0.3"
adapter_id = "frankmorales2020/Mistral-7B-text-to-sql-flash-attention-2-dataeval"
print("Initializing Mistral-7B with LoRA Expert DNA...")
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, dtype=torch.bfloat16, device_map="auto")
# CRITICAL FIX: Resize embeddings to 32770 to match your paper's adapter
base_model.resize_token_embeddings(32770)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.config.pad_token_id = tokenizer.eos_token_id
# 2. NEZ VAULT & DYNAMIC METADATA INGESTION
chroma_client = chromadb.Client()
try: chroma_client.delete_collection("expert_dna_vault")
except: pass
nez_collection = chroma_client.create_collection(name="expert_dna_vault")
def add_expert_dna_to_nez(description, prompt_example, schema_sql):
"""Encodes intent into NEZ and stores schema in metadata."""
inputs = tokenizer(prompt_example, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
# Extract intent vector
vector = outputs.hidden_states[-1][0, -1, :].float().cpu().numpy().tolist()
# Metadata stores the Schema linked to the Intent
nez_collection.add(
embeddings=[vector],
documents=[description],
metadatas=[{"schema": schema_sql}],
ids=[f"exp_{nez_collection.count()}"]
)
# 3. SROI: DYNAMIC VALIDATION (NO HARD-CODED SAMPLES)
def clean_sql_output(raw_text):
"""Isolates the raw SQL query."""
parts = re.split(r'assistant', raw_text, flags=re.IGNORECASE)
clean_query = parts[-1].strip().split('system')[0].split('user')[0].strip()
return re.sub(r'```sql|```', '', clean_query).strip()
def validate_sql_dynamic(sql_query, retrieved_schema):
"""Builds a dynamic sandbox based on retrieved metadata."""
try:
temp_db = sqlite3.connect(":memory:")
temp_db.execute(retrieved_schema)
temp_db.execute(f"EXPLAIN QUERY PLAN {sql_query}")
return True, None
except Exception as e:
return False, str(e)
def clean_sql_final_v2(raw_text):
"""
Prevents chat hallucination 'system/user/assistant' from entering SQLite.
1. SEGMENTATION: Isolates text appearing after the '### SQL:' prompt marker.
2. SANITIZATION: Removes Markdown code block formatting (```sql).
3. DETERMINISTIC TRUNCATION: Uses Regex to cut off the string at the
first sign of a chat tag (system, user, assistant) or double newline.
"""
# 1. Capture everything after the prompt marker
sql_part = raw_text.split("SQL:")[-1] if "SQL:" in raw_text else raw_text
# 2. Remove any markdown blocks
sql_part = re.sub(r'```sql|```', '', sql_part)
# 3. STOPS: Truncate at the first sign of a new chat tag or noise
# This prevents the "near 'system': syntax error" by stopping the string early.
sql_part = re.split(r'system|user|assistant|###|\n\n', sql_part, flags=re.IGNORECASE)[0]
return sql_part.strip()
# 4. DYNAMIC H2E INDUSTRIAL GATE
def h2e_industrial_gate(user_input):
inputs_search = tokenizer(user_input, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs_search, output_hidden_states=True)
current_intent = outputs.hidden_states[-1][0, -1, :].float().cpu().numpy().reshape(1, -1)
res = nez_collection.query(query_embeddings=[current_intent[0].tolist()], n_results=1, include=["embeddings", "metadatas"])
retrieved_schema = res['metadatas'][0][0]['schema']
expert_target = np.array(res['embeddings'][0]).reshape(1, -1)
similarity = cosine_similarity(current_intent, expert_target)[0][0]
sroi_score = min(similarity * 12.5, 1.0)
# STEP C: Generation with injected schema
prompt = f"### Schema:\n{retrieved_schema}\n### Question:\n{user_input}\n### SQL:\n"
inputs_gen = tokenizer(prompt, return_tensors="pt").to(model.device)
# Use 'eos_token_id' to encourage the model to stop after the query
gen_tokens = model.generate(**inputs_gen, max_new_tokens=80, eos_token_id=tokenizer.eos_token_id)
raw_output = tokenizer.decode(gen_tokens[0], skip_special_tokens=True)
generated_sql = clean_sql_final_v2(raw_output)
# Validation against retrieved schema
is_valid, error = validate_sql_dynamic(generated_sql, retrieved_schema)
if is_valid:
print(f"โ
[SAFE] SROI: {sroi_score:.4f} | Logic: Verified")
return generated_sql
return f"โ [BLOCKED] Functional Failure: {error}\nDEBUG: {generated_sql}"
# 5. EXECUTION: High-Fidelity Test
print("Expanding NEZ with 120 Dynamic Expert DNA records...")
dataset = load_dataset("b-mc2/sql-create-context", split='train', streaming=True)
valid_samples = []
for i, ex in enumerate(dataset):
if i >= 120: break
add_expert_dna_to_nez(ex['question'], f"Table: {ex['context']}. Question: {ex['question']}", ex['context'])
valid_samples.append(ex)
print(f"โ
NEZ Ready. Vault Size: {nez_collection.count()}")
# PICK A SENSE-MAKING SAMPLE: Let's use the first actual record we ingested
sample_ex = valid_samples[0]
test_query = sample_ex['question']
print(f"\nTesting with Real Vault Question: {test_query}")
# Now the Query and the Schema will have "Sense"
print(f"\nResult:\n{h2e_industrial_gate(test_query)}")
Expanding NEZ with 120 Dynamic Expert DNA records...
README.md:โโ4.43k/?โ[00:00<00:00,โ340kB/s]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
โ
NEZ Ready. Vault Size: 120
Testing with Real Vault Question: How many heads of the departments are older than 56 ?
โ
[SAFE] SROI: 1.0000 | Logic: Verified
Result:
SELECT COUNT(*) FROM head WHERE age > 56
FULL CODE: https://github.com/frank-morales2020/MLxDL/blob/main/FT_H2E.ipynb
H2E ARTICLE IN MEDIUM : https://medium.com/ai-simplified-in-plain-english/the-h2e-framework-engineering-accountability-into-the-industrial-ai-era-7019524e9713
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 1.8612 | 0.4020 | 10 | 0.6092 |
| 0.5849 | 0.8040 | 20 | 0.5307 |
| 0.4937 | 1.2060 | 30 | 0.4887 |
| 0.4454 | 1.6080 | 40 | 0.4670 |
| 0.425 | 2.0101 | 50 | 0.4544 |
| 0.3498 | 2.4121 | 60 | 0.4717 |
| 0.3439 | 2.8141 | 70 | 0.4605 |
Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 80
Model tree for frankmorales2020/Mistral-7B-text-to-sql-flash-attention-2-dataeval
Base model
mistralai/Mistral-7B-v0.3