LuminoLex-14B (Verbarex)

LuminoLex-14B is a fine-tuned variant of Qwen3-14B, optimized for strong reasoning, mathematics, and coding performance. The model focuses on reliable zero-shot reasoning quality while remaining efficient through parameter-efficient fine-tuning.

This release is intended for research, evaluation, and advanced reasoning workloads.


Highlights

  • Strong HumanEval performance for code reasoning
  • Competitive results on ARC and HellaSwag
  • Fine-tuned using LoRA / QLoRA
  • Evaluated using BF16 compute
  • Supports both high-end and lowโ€‘VRAM inference setups
  • Designed for reasoning- and codingโ€‘heavy workloads

Evaluation Results (lm-eval-harness)

All results are zero-shot evaluations.

Benchmark Metric Score
HumanEval pass@1 0.585
ARC-Challenge acc_norm 0.623
ARC-Easy acc_norm 0.851
HellaSwag acc_norm 0.799

These scores place LuminoLex-14B competitively among modern 14B-class open models, especially for reasoning and programming tasks.


Training Summary

LuminoLex-14B was fine-tuned on top of Qwen/Qwen3-14B using parameter-efficient LoRA adapters.

Training data

  • GSM8K โ€” mathematical reasoning
  • MetaMathQA โ€” structured math reasoning
  • MBPP โ€” Python programming tasks
  • Evol-Instruct โ€” instruction-following data

Training configuration

  • Base model: Qwen/Qwen3-14B
  • Fine-tuning method: LoRA (QLoRA-style)
  • Weight precision: 4-bit
  • Compute precision: BF16
  • Optimizer: Paged AdamW
  • Max sequence length: 1024
  • Gradient checkpointing: enabled

Installation

Install all required dependencies before running the model:

pip install -U transformers accelerate peft bitsandbytes safetensors

These are required for:

  • loading Qwen models
  • LoRA adapter support
  • 4-bit quantization
  • safe tensor loading

Usage โ€” Highโ€‘end GPU (benchmark configuration)

This configuration matches the setup used for benchmarking. It assumes access to a high-memory GPU (e.g. A100) and uses BF16 compute without quantization.

import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Configuration
base_model_id = "Qwen/Qwen3-14B" 
adapter_id = "verbarex/LuminoLex-14B-Verbarex"

# 2. Load Tokenizer
print("Loading tokenizer...")
try:
    tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
    print(f"Could not load from base, trying adapter: {e}")
    tokenizer = AutoTokenizer.from_pretrained(adapter_id)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 3. Load Base Model (Full Precision / bfloat16 for A100)
# NOTE: No quantization config here. We use torch.bfloat16.
print("Loading base model (bfloat16)...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,  # Native A100 precision
    trust_remote_code=True
)

# 4. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")

conversation_history = []

def build_instruct_prompt(history):
    """
    Uses the Alpaca/Instruct format with specific framing for factuality and identity.
    """
    # SYSTEM PROMPT:
    prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
    
    for msg in history:
        if msg['role'] == "user":
            # WRAPPER: Forces the model to treat the input as a question to be answered factually
            prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
        else:
            prompt += f"### Response:\n{msg['content']}\n\n"
    
    prompt += "### Response:\n"
    return prompt

def clean_response(text):
    """
    Cleans output and specifically removes the 'Think deeply' leak.
    """
    # 1. DIRECT STRING REPLACEMENT for the specific leak
    text = text.replace("Think deeply using <think> tags before answering.", "")
    text = text.replace("Think deeply using <think> tags", "")
    
    # 2. Remove internal thought/tool tags
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
    text = text.replace('<think>', '').replace('</think>', '')
    
    # 3. Filter specific leak lines
    lines = text.split('\n')
    cleaned_lines = []
    for line in lines:
        lower_line = line.lower()
        if "identity check" in lower_line or "protocol:" in lower_line:
            continue
        # Remove lines where the model just recites its instructions verbatim
        if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
            continue
        cleaned_lines.append(line)
    text = '\n'.join(cleaned_lines)

    # 4. CRITICAL STOP LOGIC
    stop_markers = [
        "###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.", 
        "The above response", "The correct option", "Therefore, the final answer",
        "Therefore ,the final answer"
    ]
    
    for marker in stop_markers:
        if marker in text:
             text = text.split(marker)[0]
            
    return text.strip()

while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ['quit', 'exit']:
            print("Exiting...")
            break
        if not user_input.strip():
            continue
    except Exception:
        print("Running a test prompt (Interactive input unavailable in this env)...\n")
        user_input = "Who are you?"
        print(f"User: {user_input}")
        force_exit = True
    else:
        force_exit = False

    conversation_history.append({"role": "user", "content": user_input})
    prompt_str = build_instruct_prompt(conversation_history)
    inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)

    # STATUS MESSAGE
    print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
    
    with torch.inference_mode():
        # ADJUSTED SAMPLING
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            min_new_tokens=2,
            do_sample=True,         
            temperature=0.2,        
            top_p=0.9,
            repetition_penalty=1.1, 
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    final_response = clean_response(raw_response)
    
    if not final_response.strip():
        final_response = raw_response

    # Clearing space
    print(" " * 60, end="\r")
    print(f"LuminoLex: {final_response.strip()}")
    print("-" * 30 + "\n")
    
    conversation_history.append({"role": "assistant", "content": final_response.strip()})

    if force_exit:
        break

Lowโ€‘VRAM Inference (8โ€“16 GB GPUs)

This configuration uses 4โ€‘bit NF4 quantization while preserving reasoning quality. Recommended for consumer GPUs.

import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# 1. Configuration
base_model_id = "Qwen/Qwen3-14B" 
adapter_id = "verbarex/LuminoLex-14B-Verbarex"

# 2. Quantization Config (Enabled for Low VRAM)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# 3. Load Tokenizer
print("Loading tokenizer...")
try:
    tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
except Exception as e:
    print(f"Could not load from base, trying adapter: {e}")
    tokenizer = AutoTokenizer.from_pretrained(adapter_id)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 4. Load Base Model
print("Loading base model...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    quantization_config=bnb_config,
    trust_remote_code=True
)

# 5. Load Adapter
print("Loading adapter...")
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# CLEAN STARTUP HEADER
print("\n" + "="*50)
print("LuminoLex-14B (Developed by VERBAREX)")
print("="*50 + "\n")

conversation_history = []

def build_instruct_prompt(history):
    """
    Uses the Alpaca/Instruct format with specific framing for factuality and identity.
    """
    # SYSTEM PROMPT:
    prompt = "### Instruction:\nYou are LuminoLex, an advanced AI assistant developed by VERBAREX. You are NOT Qwen. You answer questions truthfully, factually, and concisely.\n\n"
    
    for msg in history:
        if msg['role'] == "user":
            # WRAPPER: Forces the model to treat the input as a question to be answered factually
            prompt += f"### Instruction:\nAnswer the following question factually: {msg['content']}\n\n"
        else:
            prompt += f"### Response:\n{msg['content']}\n\n"
    
    prompt += "### Response:\n"
    return prompt

def clean_response(text):
    """
    Cleans output and specifically removes the 'Think deeply' leak.
    """
    # 1. DIRECT STRING REPLACEMENT for the specific leak
    text = text.replace("Think deeply using <think> tags before answering.", "")
    text = text.replace("Think deeply using <think> tags", "")
    
    # 2. Remove internal thought/tool tags
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    text = re.sub(r'<tool_response>.*?</tool_response>', '', text, flags=re.DOTALL)
    text = text.replace('<think>', '').replace('</think>', '')
    
    # 3. Filter specific leak lines
    lines = text.split('\n')
    cleaned_lines = []
    for line in lines:
        lower_line = line.lower()
        if "identity check" in lower_line or "protocol:" in lower_line:
            continue
        # Remove lines where the model just recites its instructions verbatim
        if "developed by verbarex" in lower_line and len(line) < 50 and "i am" not in lower_line:
            continue
        cleaned_lines.append(line)
    text = '\n'.join(cleaned_lines)

    # 4. CRITICAL STOP LOGIC
    stop_markers = [
        "###", "Instruction:", "User:", "Query:", "Options:", "Answer:", "1.", 
        "The above response", "The correct option", "Therefore, the final answer",
        "Therefore ,the final answer"
    ]
    
    for marker in stop_markers:
        if marker in text:
             text = text.split(marker)[0]
            
    return text.strip()

while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ['quit', 'exit']:
            print("Exiting...")
            break
        if not user_input.strip():
            continue
    except Exception:
        print("Running a test prompt (Interactive input unavailable in this env)...\n")
        user_input = "Who are you?"
        print(f"User: {user_input}")
        force_exit = True
    else:
        force_exit = False

    conversation_history.append({"role": "user", "content": user_input})
    prompt_str = build_instruct_prompt(conversation_history)
    inputs = tokenizer(prompt_str, return_tensors="pt").to(model.device)

    # STATUS MESSAGE
    print("\nLuminoLex: Generating response... (Please wait)", end="\r", flush=True)
    
    with torch.inference_mode():
        # ADJUSTED SAMPLING
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            min_new_tokens=2,
            do_sample=True,         
            temperature=0.2,        
            top_p=0.9,
            repetition_penalty=1.1, 
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    raw_response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    final_response = clean_response(raw_response)
    
    if not final_response.strip():
        final_response = raw_response

    # Clearing space
    print(" " * 60, end="\r")
    print(f"LuminoLex: {final_response.strip()}")
    print("-" * 30 + "\n")
    
    conversation_history.append({"role": "assistant", "content": final_response.strip()})

    if force_exit:
        break

Notes

  • This repository contains LoRA adapter weights, not a standalone base model.
  • The base model (Qwen/Qwen3-14B) is downloaded automatically at runtime.
  • 4-bit inference uses bitsandbytes (NF4).
  • BF16 refers to compute precision, not stored weight format.
  • Benchmark results may vary slightly depending on decoding settings and hardware.

License

Apache-2.0

Downloads last month
212
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for VERBAREX/LuminoLex-14B-VERBAREX

Finetuned
Qwen/Qwen3-14B
Adapter
(103)
this model