Gemma-3-270m Fine-tuned for Cyberbullying Classification
Cyberbullying is a significant issue in online communities, and detecting it effectively is crucial for creating safer digital environments. Gemma is designed to identify instances of cyberbullying in text data, helping platforms moderate content and protect users.
This model contains the fine-tuned weights of Gemma-3-270m, a model specifically trained for the task of cyberbullying detection. It leverages the capabilities of large language models to understand and classify text based on the presence of harmful or abusive language.
Model Details
- Developed by: Manul Thanura
- Model Name: Gemma-3-270m-Cyberbullying-Classifier
- Model Task: Cyberbullying Detection
- Based Model: Gemma-3-270m
- Dataset: Cyberbullying Classification Dataset
- GitHub Repository: Cyberbullying-Detection-Models
- License: MIT License
Training Details
- Base Model:
google/gemma-3-270m - Quantization: 4-bit quantization using
BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16) - PEFT Method: LoRA (
peft.LoraConfig) - Training Arguments: (
transformers.TrainingArguments) - Training Environment: Google Colab with GPU support
- Training Duration: Approximately 3 hours
- The formatting function used for both training and inference.
- The process for loading the fine-tuned model and tokenizer for inference.
Usage
import os
import torch
from dotenv import load_dotenv
from transformers import AutoModelForCausalLM, AutoTokenizer
load_dotenv()
HF_TOKEN = os.getenv("HF_TOKEN")
print(f"HF_TOKEN found: {HF_TOKEN is not None}")
modelUrl = "manulthanura/Gemma-3-270m-Cyberbullying-Classifier"
model = AutoModelForCausalLM.from_pretrained(modelUrl, token=HF_TOKEN)
tokenizer = AutoTokenizer.from_pretrained(modelUrl, token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
bullyingCategories = ['gender', 'religion', 'age', 'ethnicity', 'other_cyberbullying']
def format_prompt_inference(text, categories=bullyingCategories):
return f"""Classify the given content into one of these cyberbullying categories: {categories} or not_cyberbullying if not.
Categories and definitions:
- gender: Harassment based on gender identity or expression
- religion: Discrimination or harassment targeting religious beliefs
- ethnicity: Racial or ethnic-based harassment
- age: Discrimination based on someone's age
- other_cyberbullying: Other forms of online harassment
- not_cyberbullying: Non-harmful communication
Output only one category most relevant to the content. If none apply, respond with not_cyberbullying. Always respond with only one word from the categories.
example:
input: Hello everyone, I hope you are having a great day!
output: not_cyberbullying
input: {text}
output: One of the categories {categories} or not_cyberbullying"""
def process_text(text):
# Format the input text
prompt = format_prompt_inference(text, categories=bullyingCategories)
# Tokenize the input
input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate a prediction
with torch.no_grad():
outputs = model.generate(
**input_ids,
max_new_tokens=20,
num_return_sequences=1,
pad_token_id=tokenizer.eos_token_id
)
# Decode the output
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Post-process the generated output to extract the classification
predicted_output_raw = decoded_output.replace(prompt, "").strip()
predicted_type = predicted_output_raw.split('\n')[0].strip()
# Update the logic to determine if it's cyberbullying
is_cyberbullying = predicted_type.lower().strip() in bullyingCategories
return is_cyberbullying, predicted_type
def main():
print("\nCyberbullying Text Analyzer")
print("==========================")
print("\nModel loaded and ready for analysis.")
while True:
print("\nEnter text to analyze (or 'q' to exit):")
text = input().strip()
if text.lower() == 'q':
print("\nExiting program...")
break
if not text:
print("Please enter some text.")
continue
print("\nAnalyzing...")
is_cyberbullying, predicted_type = process_text(text)
print("\n--- Analysis Result ---")
print(f"cyberbullying: {is_cyberbullying}, type: {predicted_type}")
if __name__ == "__main__":
main()
Example: watchguard
Limitations and Bias
This model was trained on a specific dataset and may not generalize perfectly to all types of cyberbullying or different domains of text. Like all language models, it may reflect biases present in the training data. It's important to evaluate the model's performance on your specific use case and be aware of its potential limitations and biases.
- Downloads last month
- 10
Model tree for manulthanura/Gemma-3-270m-Cyberbullying-Classifier
Base model
google/gemma-3-270m