“Predict the next token”

not

“Obey the instruction”

QLoRA Instruction Tuning on Pythia-1B

This repository provides a Hugging Face–compatible LoRA adapter trained via QLoRA (4-bit quantization + LoRA adapters) on the EleutherAI Pythia-1B-deduped base model.

The project focuses on producing and publishing a reusable LoRA adapter using a modern, memory-efficient instruction-tuning pipeline built with Hugging Face Transformers, PEFT, and BitsAndBytes. It is designed for learning, experimentation, and small-GPU environments (e.g. Colab).


✨ Key Features (Adapter-Centric)

  • 🔒 Frozen base model: Pythia-1B-deduped (not included in this repository)
  • 🧠 QLoRA training with 4-bit NF4 quantization
  • 🧩 LoRA adapters only are trainable (<1% parameters)
  • 💾 Optimized for low GPU memory usage
  • 📚 Clear, minimal pipeline for understanding instruction tuning

🧠 What This Adapter Represents

This adapter demonstrates how to:

  • Load a 4-bit quantized causal language model
  • Prepare it for k-bit training
  • Apply LoRA adapters for parameter-efficient fine-tuning
  • Perform instruction tuning using causal LM loss
  • Train using the Hugging Face Trainer API

Formally, training follows:

Frozen Base Model (4-bit)
+ Trainable LoRA ΔW
→ Instruction-following behavior

🏗️ Model & Training Setup

Base Model

  • Model: EleutherAI/pythia-1B-deduped
  • Architecture: Decoder-only Transformer
  • Quantization: 4-bit NF4 (BitsAndBytes)

LoRA Configuration

Parameter Value Description
r 32 LoRA rank (expressiveness)
lora_alpha 32 Scaling factor
lora_dropout 0.05 Regularization
bias none Only LoRA parameters are trained
task_type CAUSAL_LM Causal language modeling

Only LoRA parameters are trainable; all base model weights remain frozen.


📦 Dataset

  • Type: Instruction-formatted text dataset

  • Format: Each example contains a text field

  • Tokenization:

    • Max length: 512
    • Padding: max_length
    • Truncation enabled

Loss is computed using standard causal language modeling, meaning the model learns to predict the full sequence (instruction + response).


🚀 Adapter Training & Usage Pipeline

1. Load tokenizer and model

  • Load Pythia tokenizer
  • Set pad_token = eos_token
  • Load model with 4-bit quantization

2. Prepare for QLoRA training

  • Enable gradient checkpointing
  • Cast critical layers for numerical stability
  • Freeze base model parameters

3. Apply LoRA adapters

  • Inject LoRA modules into attention and MLP layers
  • Print trainable parameter count

4. Training configuration

Setting Value
Epochs 3
Batch size 6
Gradient accumulation 4
Effective batch size 24
Learning rate 2e-4
Optimizer paged_adamw_8bit
Precision FP16

5. Start

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, LoraConfig

base_model_name = "EleutherAI/pythia-1B-deduped"
lora_repo = "BEncoderRT/Pythia-QLoRA-Instruction-Tuning"

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.eos_token



# Load the base model with the new quantization configuration
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    dtype=torch.bfloat16 # Corrected: Use dtype instead of torch_dtype
)

# Load the PEFT model (LoRA adapters)
model = PeftModel.from_pretrained(base_model, lora_repo)
python

import torch

# Ensure the model is in evaluation mode
model.eval()

# Function to format prompts consistently with training data
def format_prompt(instruction, context=None):
    if context:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Response:\n"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

# Define a few test prompts
test_prompts = [
    {
        "instruction": "Explain the concept of photosynthesis in simple terms.",
        "context": None
    },
    {
        "instruction": "What is the capital of France?",
        "context": None
    },
    {
        "instruction": "Summarize the main idea of the following text:",
        "context": "The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram."
    },
    {
        "instruction": "List three benefits of regular exercise.",
        "context": None
    }
]

# Add the new test prompts (assuming `new_test_prompts` is defined as in the previous step)
new_test_prompts = [
    {
        "instruction": "Write a short, imaginative story about a cat who discovers a secret portal to another dimension under its owner's bed.",
        "context": None
    },
    {
        "instruction": "If a train leaves New York at 10 AM traveling at 60 mph and another train leaves Chicago at 11 AM traveling at 50 mph, and the cities are 800 miles apart, at what time do they meet? (Assume they are traveling towards each other on the same track).",
        "context": None
    },
    {
        "instruction": "What is the capital of Australia?",
        "context": None
    },
    {
        "instruction": "Explain the difference between supervised and unsupervised learning in machine learning, and provide an example of when each would be used.",
        "context": None
    },
    {
        "instruction": "Summarize the following passage:",
        "context": "The advent of artificial intelligence has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests."
    }
]
test_prompts.extend(new_test_prompts)

python
# Ensure the base model is in evaluation mode
base_model.eval()

# Function to format prompts consistently with training data
def format_prompt(instruction, context=None):
    if context:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Response:\n"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

# Reuse the test_prompts defined previously
# (Assuming test_prompts is available from previous execution or defined globally)

print("\n--- Generating Responses from BASE MODEL ---\n")
with torch.no_grad():
    for i, prompt_data in enumerate(test_prompts):
        instruction = prompt_data["instruction"]
        context = prompt_data["context"]

        formatted_input = format_prompt(instruction, context)

        # Tokenize the input prompt
        inputs = tokenizer(formatted_input, return_tensors="pt").to(base_model.device)

        # Generate response using the BASE MODEL
        outputs = base_model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.95,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

        generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = generated_text[len(formatted_input):].strip()

        print(f"### Test Prompt {i+1} (BASE MODEL): ###")
        print(f"Instruction: {instruction}")
        if context:
            print(f"Context: {context}")
        print(f"Base Model Response: {response}\n")

--- Generating Responses from BASE MODEL ---

### Test Prompt 1 (BASE MODEL): ###
Instruction: Explain the concept of photosynthesis in simple terms.
Base Model Response: Photosynthesis is the conversion of light energy to chemical energy in the form of water. It is a complex process which requires the presence of oxygen in the environment, and the presence of light energy. The energy required for the process is provided by the sun. The process requires two molecules of oxygen, one of which is the oxygen atom, and the other is a molecule of hydrogen. The oxygen molecule combines with the hydrogen molecule to form water. The process requires the presence of a compound of carbon called carbon dioxide. The carbon dioxide molecule combines with the oxygen molecule to form carbon dioxide. This is then converted to water. The process is the conversion of light energy into chemical energy. The carbon dioxide molecule is converted to carbon dioxide, and so on.

### Test Prompt 2 (BASE MODEL): ###
Instruction: What is the capital of France?
Base Model Response: Paris is the capital of France. It is located in the west of the country, in the department of Seine-et-Marne. The city is home to several historical and cultural attractions, including the Eiffel Tower, Notre Dame, the Louvre, the Jardin des Tuileries, the Musée Rodin, and the Pompidou Center. The city also has several shopping centers, including a department store called Les Halles, a clothing store called L'Usine, a pharmacy called The Rue des Archives, and a department store called Le Comptoir.

The city also hosts many cultural events, including the annual Les Halles market. In the summer, the city is also host to the

### Test Prompt 3 (BASE MODEL): ###
Instruction: Summarize the main idea of the following text:
Context: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.
Base Model Response: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all

### Test Prompt 4 (BASE MODEL): ###
Instruction: List three benefits of regular exercise.
Base Model Response: 1. Regular exercise helps your body to maintain its health and fitness level.
2. Regular exercise helps your body to lose weight and maintain a healthy weight.
3. Regular exercise helps your body to reduce its risk of diseases such as heart disease, diabetes, and cancer.

It is important to choose a form of exercise that will benefit you and your health. For example, if you are trying to lose weight, you may choose to exercise for 30 minutes a day, 5 days a week. If you are trying to stay healthy, you may choose to exercise for 30 minutes a day, 5 days a week. If you want to build muscle, you may choose to exercise for 20 minutes a day, 5 days a week.

### Test Prompt 5 (BASE MODEL): ###
Instruction: Write a short, imaginative story about a cat who discovers a secret portal to another dimension under its owner's bed.
Base Model Response: When the owner of the house was a young boy, he used to sneak out of the house in the middle of the night. He would find himself in a new place, and he would try to figure out how to get back to his old house. One night, the cat woke up in a strange place and was confused about what she was seeing. It was a portal to another dimension, and she had no idea how she had gotten there. She was so confused that she was afraid to open the portal. However, she felt so safe and comfortable there that she didn't want to leave. It was a place where she could hide and feel safe. So, she opened the portal and entered the other dimension. She had no idea what happened

### Test Prompt 6 (BASE MODEL): ###
Instruction: If a train leaves New York at 10 AM traveling at 60 mph and another train leaves Chicago at 11 AM traveling at 50 mph, and the cities are 800 miles apart, at what time do they meet? (Assume they are traveling towards each other on the same track).
Base Model Response: Both trains will arrive at the same time, at 10:01 AM, in New York.

The two trains will arrive at the same time, at 10:02 AM, in Chicago.

Therefore, the time of arrival in Chicago is 10:02 AM, and the time of arrival in New York is 10:01 AM.

This is because, from the point of view of the other train, New York is still on the same track.

If the time of arrival in New York is later than 10:01 AM, the other train will arrive at the same time, at 10:01 AM, in Chicago.

The time of arrival in Chicago is 11:00 AM, and the time of

### Test Prompt 7 (BASE MODEL): ###
Instruction: What is the capital of Australia?
Base Model Response: Towns in Australia are Sydney, Melbourne, Brisbane, Adelaide, Perth, Darwin, and Hobart. Towns are also named after rivers, such as the Murray River in Melbourne and the Murray in Brisbane. Towns can also be named after famous people, such as Robert Burns in Perth and John F Kennedy in Darwin.

Australia is the second largest country in the world, with an area of almost 9,000,000 square miles (24,000,000 km2) and an average population of about 2.6 million people.

Australia is the world's fifth-most-populous country, and the most populous democracy. The population of Australia is estimated to be more than five times the size of the United States

### Test Prompt 8 (BASE MODEL): ###
Instruction: Explain the difference between supervised and unsupervised learning in machine learning, and provide an example of when each would be used.
Base Model Response: In supervised learning, the training data is used to find a specific solution to the problem.  In unsupervised learning, the data is not used to find a specific solution, but rather the data is used to find a general solution to the problem.  For example, in a speech recognition system, the input data may contain the acoustic features of the speaker.  The acoustic features are then used to train a model to recognize the speaker.  The speaker's voice is then used as the training data for the model.  In the case of speech recognition, the input data may contain the speaker's identity, their speech rate, and their pitch.  These features are then used to train the model to recognize the speaker.  The model may then

### Test Prompt 9 (BASE MODEL): ###
Instruction: Summarize the following passage:
Context: The advent of artificial intelligence has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.
Base Model Response: Artificial intelligence has revolutionized the way we live and work. It enables us to have better decision-making, to work more efficiently, to live better, and to make more money. But it also raises some ethical concerns, as well as questions about job displacement, privacy, and bias in algorithms.

A multidisciplinary approach involving policymakers, technologists, ethicists, and the public is needed to shape a future where AI serves humanity’s best interests. This calls for a societal dialogue to help us all navigate the future of AI and to find a way to integrate AI responsibly.

We need a new approach that combines the wisdom of scientists and technologists with the ethics of ethicists and policymakers. It will help us

python
# Ensure the LoRA-tuned model is in evaluation mode
model.eval()

# Reuse the format_prompt function and test_prompts

print("\n--- Generating Responses from LO-RA TUNED MODEL ---\n")
with torch.no_grad():
    for i, prompt_data in enumerate(test_prompts):
        instruction = prompt_data["instruction"]
        context = prompt_data["context"]

        formatted_input = format_prompt(instruction, context)

        # Tokenize the input prompt
        inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)

        # Generate response using the LO-RA TUNED MODEL
        outputs = model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.95,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

        generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = generated_text[len(formatted_input):].strip()

        print(f"### Test Prompt {i+1} (LO-RA MODEL): ###")
        print(f"Instruction: {instruction}")
        if context:
            print(f"Context: {context}")
        print(f"LoRA Model Response: {response}\n")

--- Generating Responses from LO-RA TUNED MODEL ---

### Test Prompt 1 (LO-RA MODEL): ###
Instruction: Explain the concept of photosynthesis in simple terms.
LoRA Model Response: Photosynthesis is a process whereby a source of energy is converted to a form of energy which is either used to generate or used to sustain cellular life.  In this process, plants use the energy of light as a source of energy to make sugars from carbon dioxide and water.  Photosynthesis produces sugars from carbon dioxide, water, and light.  The sugars produced are used in the process of making starch, which is used to make bread, rice, and many other foods.  The process is also used in the production of alcohol, which is used in making beer, wine, and other beverages.  The process also produces the green plant material that is used for making clothing and other products.  The process is also used to produce biodies

### Test Prompt 2 (LO-RA MODEL): ###
Instruction: What is the capital of France?
LoRA Model Response: Paris, France is the capital of France. The French also call it "the city of light" and they have a word that is pronounced "Fraa" - which is an easy pronunciation of the French word for Paris. Paris is also the country's largest city and the financial, cultural, and business centre of France. Paris is a great place to go on vacation or for business. The famous Eiffel Tower is the tallest building in Paris and the Louvre Museum is one of the world's most visited museums. The city is also known for its many cafés and restaurants. Paris is also the most expensive city to live in in the world and is the second most expensive city in the world after London. Paris is the

### Test Prompt 3 (LO-RA MODEL): ###
Instruction: Summarize the main idea of the following text:
Context: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.
LoRA Model Response: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all

### Test Prompt 4 (LO-RA MODEL): ###
Instruction: List three benefits of regular exercise.
LoRA Model Response: Exercise provides a number of benefits, including building muscle, increasing metabolism, and reducing blood pressure.  Exercise also helps with weight management and helps prevent or slow the progression of chronic diseases like heart disease, diabetes, and high cholesterol.  Regular exercise can also help you manage stress and anxiety, and may even improve mood.  

Regular exercise can also help you manage the symptoms of several different conditions, including asthma, arthritis, depression, and anxiety.  Some conditions are more common in people who exercise, such as diabetes, and some are more common in people who are sedentary, such as heart disease.  It is also important to remember that if you are pregnant, it is important to exercise regularly.  Exercise can also improve

### Test Prompt 5 (LO-RA MODEL): ###
Instruction: Write a short, imaginative story about a cat who discovers a secret portal to another dimension under its owner's bed.
LoRA Model Response: "The Cat in the Bedroom" is a short, imaginative story about a cat who discovers a secret portal to another dimension under his owner's bed.

The story begins with the cat waking up in the middle of the night. It looks around the room, and finds a secret portal that leads to the hallway outside the room. The cat climbs through the portal and explores the hallway. It is dark, so it can't see the walls, and it is very loud. It is extremely frightened and worried, and it is trying to hide. Suddenly, it hears a noise and it looks out the window. It sees a cat sitting on the roof of the house, looking up at it. The cat seems to be enjoying the

### Test Prompt 6 (LO-RA MODEL): ###
Instruction: If a train leaves New York at 10 AM traveling at 60 mph and another train leaves Chicago at 11 AM traveling at 50 mph, and the cities are 800 miles apart, at what time do they meet? (Assume they are traveling towards each other on the same track).
LoRA Model Response: If the two trains were traveling on the same track, they would meet at the same time, at 10 AM in New York. However, if the trains were traveling towards each other on the same track, they would meet at 10 AM in Chicago and then both would meet at 11 AM in Chicago. If the two trains were traveling on different tracks, they would meet at the same time, at 11 AM in Chicago. However, if the trains were traveling towards each other on the same track, they would meet at 10 AM in New York and then at 11 AM in New York. If the two trains were traveling on different tracks, they would meet at the same time, at 11 AM in New York. However, if the trains were traveling towards

### Test Prompt 7 (LO-RA MODEL): ###
Instruction: What is the capital of Australia?
LoRA Model Response: Melbourne is the capital of Australia. It is located in the Southern Hemisphere and the largest city in Australia. Melbourne is the second largest city in Australia and is the largest city in Australia and the tenth largest city in the world. It is also the second largest city by population in Australia after Sydney. The city has a population of around 4.3 million people. It is located on the eastern coast of Australia and has a population density of 18,000 people per km2. Melbourne is situated on the Great Ocean Road which is a very popular tourist attraction. The city is also known for the Royal Park and the Melbourne Cup. The city also has a large number of museums. Melbourne has some of the most beautiful beaches in Australia. The

### Test Prompt 8 (LO-RA MODEL): ###
Instruction: Explain the difference between supervised and unsupervised learning in machine learning, and provide an example of when each would be used.
LoRA Model Response: In supervised learning, the input data is labeled, and the algorithm is given a description of the problem. The algorithm is then given an example of a dataset, and trained to perform the task. In unsupervised learning, the input data is not labeled, and the algorithm is not given an example of a dataset. The algorithm is trained to perform the task without an example. This type of algorithm is often used to identify patterns or relationships between the input data, but is not designed to produce a new dataset.

When a dataset is unlabeled, it is typically treated as a black box, and the algorithm is not given a clear description of how to perform the task. For example, the algorithm could be trained to predict the temperature of a

### Test Prompt 9 (LO-RA MODEL): ###
Instruction: Summarize the following passage:
Context: The advent of artificial intelligence has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.
LoRA Model Response: Artificial intelligence is the science and technology of developing new computer and cognitive systems capable of performing tasks previously performed by humans. The term AI is often used to refer to any system that is intelligent and capable of learning, reasoning, and performing complex tasks. However, the term AI is often used to refer to computer systems that are able to perform tasks performed by humans.

Artificial intelligence is a subfield of computer science that involves the development of systems that can perform tasks previously performed by humans. The main differences between artificial intelligence and natural intelligence are that artificial intelligence is more complex, is designed to learn from experience, and is a result of human intelligence. Artificial intelligence can be used for both intelligent systems and intelligent agents. Artificial intelligence can


Downloads last month
50
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BEncoderRT/Pythia-QLoRA-Instruction-Tuning

Adapter
(2)
this model

Dataset used to train BEncoderRT/Pythia-QLoRA-Instruction-Tuning