LuminoLex-14B (Verbarex)
LuminoLex-14B is a fine-tuned variant of Qwen3-14B, optimized for strong reasoning, mathematics, and coding performance. The model focuses on reliable zero-shot reasoning quality while remaining efficient through parameter-efficient fine-tuning.
This release is intended for research, evaluation, and advanced reasoning workloads.
Highlights
- Strong HumanEval performance for code reasoning
- Competitive results on ARC and HellaSwag
- Fine-tuned using LoRA / QLoRA
- Evaluated using BF16 compute
- Supports both high-end and lowโVRAM inference setups
- Designed for reasoning- and codingโheavy workloads
Evaluation Results (lm-eval-harness)
All results are zero-shot evaluations.
| Benchmark | Metric | Score |
|---|---|---|
| HumanEval | pass@1 | 0.585 |
| ARC-Challenge | acc_norm | 0.623 |
| ARC-Easy | acc_norm | 0.851 |
| HellaSwag | acc_norm | 0.799 |
These scores place LuminoLex-14B competitively among modern 14B-class open models, especially for reasoning and programming tasks.
Training Summary
LuminoLex-14B was fine-tuned on top of Qwen/Qwen3-14B using parameter-efficient LoRA adapters.
Training data
- GSM8K โ mathematical reasoning
- MetaMathQA โ structured math reasoning
- MBPP โ Python programming tasks
- Evol-Instruct โ instruction-following data
Training configuration
- Base model:
Qwen/Qwen3-14B - Fine-tuning method: LoRA (QLoRA-style)
- Weight precision: 4-bit
- Compute precision: BF16
- Optimizer: Paged AdamW
- Max sequence length: 1024
- Gradient checkpointing: enabled
Installation
Install all required dependencies before running the model:
pip install -U transformers accelerate peft bitsandbytes safetensors
These are required for:
- loading Qwen models
- LoRA adapter support
- 4-bit quantization
- safe tensor loading
Usage โ Highโend GPU (benchmark configuration)
This configuration matches the setup used for benchmarking. It assumes access to a high-memory GPU (e.g. A100) and uses BF16 compute without quantization.
import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# 1. Configuration
base_model_id = "Qwen/Qwen3-14B"
adapter_id = "verbarex/LuminoLex-14B-Verbarex"
# 2. Load Tokenizer
print("Loading tokenizer...")
try:
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
print(f"Could not load from base, trying adapter: {e}")
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# 3. Load Base Model (Full Precision / bfloat16 for A100)
# NOTE: No quantization config here. We use torch.bfloat16.
print("Loading base model (bfloat16)...")
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.bfloat16, # Native A100 precision
trust_remote_code=True
)
# 4. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")
conversation_history = []
def build_instruct_prompt(history):
"""
Uses the Alpaca/Instruct format with specific framing for factuality and identity.
"""
# SYSTEM PROMPT:
prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
for msg in history:
if msg['role'] == "user":
# WRAPPER: Forces the model to treat the input as a question to be answered factually
prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
else:
prompt += f"### Response:\n{msg['content']}\n\n"
prompt += "### Response:\n"
return prompt
def clean_response(text):
"""
Cleans output and specifically removes the 'Think deeply' leak.
"""
# 1. DIRECT STRING REPLACEMENT for the specific leak
text = text.replace("Think deeply using <think> tags before answering.", "")
text = text.replace("Think deeply using <think> tags", "")
# 2. Remove internal thought/tool tags
text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
text = text.replace('<think>', '').replace('</think>', '')
# 3. Filter specific leak lines
lines = text.split('\n')
cleaned_lines = []
for line in lines:
lower_line = line.lower()
if "identity check" in lower_line or "protocol:" in lower_line:
continue
# Remove lines where the model just recites its instructions verbatim
if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
continue
cleaned_lines.append(line)
text = '\n'.join(cleaned_lines)
# 4. CRITICAL STOP LOGIC
stop_markers = [
"###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.",
"The above response", "The correct option", "Therefore, the final answer",
"Therefore ,the final answer"
]
for marker in stop_markers:
if marker in text:
text = text.split(marker)[0]
return text.strip()
while True:
try:
user_input = input("User: ")
if user_input.lower() in ['quit', 'exit']:
print("Exiting...")
break
if not user_input.strip():
continue
except Exception:
print("Running a test prompt (Interactive input unavailable in this env)...\n")
user_input = "Who are you?"
print(f"User: {user_input}")
force_exit = True
else:
force_exit = False
conversation_history.append({"role": "user", "content": user_input})
prompt_str = build_instruct_prompt(conversation_history)
inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)
# STATUS MESSAGE
print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
with torch.inference_mode():
# ADJUSTED SAMPLING
outputs = model.generate(
**inputs,
max_new_tokens=512,
min_new_tokens=2,
do_sample=True,
temperature=0.2,
top_p=0.9,
repetition_penalty=1.1,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
final_response = clean_response(raw_response)
if not final_response.strip():
final_response = raw_response
# Clearing space
print(" " * 60, end="\r")
print(f"LuminoLex: {final_response.strip()}")
print("-" * 30 + "\n")
conversation_history.append({"role": "assistant", "content": final_response.strip()})
if force_exit:
break
LowโVRAM Inference (8โ16 GB GPUs)
This configuration uses 4โbit NF4 quantization while preserving reasoning quality. Recommended for consumer GPUs.
import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# 1. Configuration
base_model_id = "Qwen/Qwen3-14B"
adapter_id = "verbarex/LuminoLex-14B-Verbarex"
# 2. Quantization Config (Enabled for Low VRAM)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
# 3. Load Tokenizer
print("Loading tokenizer...")
try:
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
print(f"Could not load from base, trying adapter: {e}")
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# 4. Load Base Model
print("Loading base model...")
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
device_map="auto",
quantization_config=bnb_config,
trust_remote_code=True
)
# 5. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")
conversation_history = []
def build_instruct_prompt(history):
"""
Uses the Alpaca/Instruct format with specific framing for factuality and identity.
"""
# SYSTEM PROMPT:
prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
for msg in history:
if msg['role'] == "user":
# WRAPPER: Forces the model to treat the input as a question to be answered factually
prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
else:
prompt += f"### Response:\n{msg['content']}\n\n"
prompt += "### Response:\n"
return prompt
def clean_response(text):
"""
Cleans output and specifically removes the 'Think deeply' leak.
"""
# 1. DIRECT STRING REPLACEMENT for the specific leak
text = text.replace("Think deeply using <think> tags before answering.", "")
text = text.replace("Think deeply using <think> tags", "")
# 2. Remove internal thought/tool tags
text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
text = text.replace('<think>', '').replace('</think>', '')
# 3. Filter specific leak lines
lines = text.split('\n')
cleaned_lines = []
for line in lines:
lower_line = line.lower()
if "identity check" in lower_line or "protocol:" in lower_line:
continue
# Remove lines where the model just recites its instructions verbatim
if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
continue
cleaned_lines.append(line)
text = '\n'.join(cleaned_lines)
# 4. CRITICAL STOP LOGIC
stop_markers = [
"###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.",
"The above response", "The correct option", "Therefore, the final answer",
"Therefore ,the final answer"
]
for marker in stop_markers:
if marker in text:
text = text.split(marker)[0]
return text.strip()
while True:
try:
user_input = input("User: ")
if user_input.lower() in ['quit', 'exit']:
print("Exiting...")
break
if not user_input.strip():
continue
except Exception:
print("Running a test prompt (Interactive input unavailable in this env)...\n")
user_input = "Who are you?"
print(f"User: {user_input}")
force_exit = True
else:
force_exit = False
conversation_history.append({"role": "user", "content": user_input})
prompt_str = build_instruct_prompt(conversation_history)
inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)
# STATUS MESSAGE
print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
with torch.inference_mode():
# ADJUSTED SAMPLING
outputs = model.generate(
**inputs,
max_new_tokens=512,
min_new_tokens=2,
do_sample=True,
temperature=0.2,
top_p=0.9,
repetition_penalty=1.1,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
final_response = clean_response(raw_response)
if not final_response.strip():
final_response = raw_response
# Clearing space
print(" " * 60, end="\r")
print(f"LuminoLex: {final_response.strip()}")
print("-" * 30 + "\n")
conversation_history.append({"role": "assistant", "content": final_response.strip()})
if force_exit:
break
Notes
- This repository contains LoRA adapter weights, not a standalone base model.
- The base model (
Qwen/Qwen3-14B) is downloaded automatically at runtime. - 4-bit inference uses bitsandbytes (NF4).
- BF16 refers to compute precision, not stored weight format.
- Benchmark results may vary slightly depending on decoding settings and hardware.
License
Apache-2.0
- Downloads last month
- 212