---
base_model: google/gemma-3-270m
library_name: transformers
pipeline_tag: text-classification
tags:
- fine-tuned
- gemma
- text-classification
- cyberbullying-detection
- lora
- peft
- google
license: mit
language:
- en
---
# Gemma-3-270m Fine-tuned for Cyberbullying Classification

Cyberbullying is a significant issue in online communities, and detecting it effectively is crucial for creating safer digital environments. Gemma is designed to identify instances of cyberbullying in text data, helping platforms moderate content and protect users.

This model contains the fine-tuned weights of Gemma-3-270m, a model specifically trained for the task of cyberbullying detection. It leverages the capabilities of large language models to understand and classify text based on the presence of harmful or abusive language.

## Model Details

- **Developed by**: [Manul Thanura](https://manulthanura.com)
- **Model Name**: Gemma-3-270m-Cyberbullying-Classifier
- **Model Task**: Cyberbullying Detection
- **Based Model**: [Gemma-3-270m](https://huggingface.co/google/gemma-3-270m)
- **Dataset**: [Cyberbullying Classification Dataset](https://www.kaggle.com/datasets/andrewmvd/cyberbullying-classification)
- **GitHub Repository**: [Cyberbullying-Detection-Models](https://github.com/manulthanura/Cyberbullying-Detection-Models)
- **License**: [MIT License](https://github.com/manulthanura/Cyberbullying-Detection-Models/blob/main/LICENSE)

## Training Details

- **Base Model:** `google/gemma-3-270m`
- **Quantization:** 4-bit quantization using `BitsAndBytesConfig` (`load_in_4bit=True`, `bnb_4bit_quant_type="nf4"`, `bnb_4bit_compute_dtype=torch.bfloat16`)
- **PEFT Method:** LoRA (`peft.LoraConfig`)
- **Training Arguments:** (`transformers.TrainingArguments`)
- **Training Environment:** Google Colab with GPU support
- **Training Duration:** Approximately 3 hours
- The formatting function used for both training and inference.
- The process for loading the fine-tuned model and tokenizer for inference.

## Usage

```python
import os
import torch
from dotenv import load_dotenv
from transformers import AutoModelForCausalLM, AutoTokenizer

load_dotenv()

HF_TOKEN = os.getenv("HF_TOKEN")
print(f"HF_TOKEN found: {HF_TOKEN is not None}")

modelUrl = "manulthanura/Gemma-3-270m-Cyberbullying-Classifier"
model = AutoModelForCausalLM.from_pretrained(modelUrl, token=HF_TOKEN)

tokenizer = AutoTokenizer.from_pretrained(modelUrl, token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

bullyingCategories = ['gender', 'religion', 'age', 'ethnicity', 'other_cyberbullying']

def format_prompt_inference(text, categories=bullyingCategories):
    return f"""Classify the given content into one of these cyberbullying categories: {categories} or not_cyberbullying if not.

    Categories and definitions:
    - gender: Harassment based on gender identity or expression
    - religion: Discrimination or harassment targeting religious beliefs
    - ethnicity: Racial or ethnic-based harassment
    - age: Discrimination based on someone's age
    - other_cyberbullying: Other forms of online harassment
    - not_cyberbullying: Non-harmful communication

    Output only one category most relevant to the content. If none apply, respond with not_cyberbullying. Always respond with only one word from the categories.
    example:
        input: Hello everyone, I hope you are having a great day!
        output: not_cyberbullying

    input: {text}
    output: One of the categories {categories} or not_cyberbullying"""

def process_text(text):
    # Format the input text
    prompt = format_prompt_inference(text, categories=bullyingCategories)

    # Tokenize the input
    input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)

    # Generate a prediction
    with torch.no_grad():
        outputs = model.generate(
            **input_ids,
            max_new_tokens=20,
            num_return_sequences=1,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode the output
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Post-process the generated output to extract the classification
    predicted_output_raw = decoded_output.replace(prompt, "").strip()
    predicted_type = predicted_output_raw.split('\n')[0].strip()

    # Update the logic to determine if it's cyberbullying
    is_cyberbullying = predicted_type.lower().strip() in bullyingCategories

    return is_cyberbullying, predicted_type

def main():
    print("\nCyberbullying Text Analyzer")
    print("==========================")
    print("\nModel loaded and ready for analysis.")
    
    while True:
        print("\nEnter text to analyze (or 'q' to exit):")
        text = input().strip()
        
        if text.lower() == 'q':
            print("\nExiting program...")
            break
            
        if not text:
            print("Please enter some text.")
            continue
            
        print("\nAnalyzing...")
        is_cyberbullying, predicted_type = process_text(text)
        
        print("\n--- Analysis Result ---")
        print(f"cyberbullying: {is_cyberbullying}, type: {predicted_type}")

if __name__ == "__main__":
    main()
```

Example: [watchguard](https://gitlab.com/manulthanura/watchguard)

## Limitations and Bias

This model was trained on a specific dataset and may not generalize perfectly to all types of cyberbullying or different domains of text. Like all language models, it may reflect biases present in the training data. It's important to evaluate the model's performance on your specific use case and be aware of its potential limitations and biases.