--- base_model: google/gemma-3-270m library_name: transformers pipeline_tag: text-classification tags: - fine-tuned - gemma - text-classification - cyberbullying-detection - lora - peft - google license: mit language: - en --- # Gemma-3-270m Fine-tuned for Cyberbullying Classification Cyberbullying is a significant issue in online communities, and detecting it effectively is crucial for creating safer digital environments. Gemma is designed to identify instances of cyberbullying in text data, helping platforms moderate content and protect users. This model contains the fine-tuned weights of Gemma-3-270m, a model specifically trained for the task of cyberbullying detection. It leverages the capabilities of large language models to understand and classify text based on the presence of harmful or abusive language. ## Model Details - **Developed by**: [Manul Thanura](https://manulthanura.com) - **Model Name**: Gemma-3-270m-Cyberbullying-Classifier - **Model Task**: Cyberbullying Detection - **Based Model**: [Gemma-3-270m](https://huggingface.co/google/gemma-3-270m) - **Dataset**: [Cyberbullying Classification Dataset](https://www.kaggle.com/datasets/andrewmvd/cyberbullying-classification) - **GitHub Repository**: [Cyberbullying-Detection-Models](https://github.com/manulthanura/Cyberbullying-Detection-Models) - **License**: [MIT License](https://github.com/manulthanura/Cyberbullying-Detection-Models/blob/main/LICENSE) ## Training Details - **Base Model:** `google/gemma-3-270m` - **Quantization:** 4-bit quantization using `BitsAndBytesConfig` (`load_in_4bit=True`, `bnb_4bit_quant_type="nf4"`, `bnb_4bit_compute_dtype=torch.bfloat16`) - **PEFT Method:** LoRA (`peft.LoraConfig`) - **Training Arguments:** (`transformers.TrainingArguments`) - **Training Environment:** Google Colab with GPU support - **Training Duration:** Approximately 3 hours - The formatting function used for both training and inference. - The process for loading the fine-tuned model and tokenizer for inference. ## Usage ```python import os import torch from dotenv import load_dotenv from transformers import AutoModelForCausalLM, AutoTokenizer load_dotenv() HF_TOKEN = os.getenv("HF_TOKEN") print(f"HF_TOKEN found: {HF_TOKEN is not None}") modelUrl = "manulthanura/Gemma-3-270m-Cyberbullying-Classifier" model = AutoModelForCausalLM.from_pretrained(modelUrl, token=HF_TOKEN) tokenizer = AutoTokenizer.from_pretrained(modelUrl, token=HF_TOKEN) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" bullyingCategories = ['gender', 'religion', 'age', 'ethnicity', 'other_cyberbullying'] def format_prompt_inference(text, categories=bullyingCategories): return f"""Classify the given content into one of these cyberbullying categories: {categories} or not_cyberbullying if not. Categories and definitions: - gender: Harassment based on gender identity or expression - religion: Discrimination or harassment targeting religious beliefs - ethnicity: Racial or ethnic-based harassment - age: Discrimination based on someone's age - other_cyberbullying: Other forms of online harassment - not_cyberbullying: Non-harmful communication Output only one category most relevant to the content. If none apply, respond with not_cyberbullying. Always respond with only one word from the categories. example: input: Hello everyone, I hope you are having a great day! output: not_cyberbullying input: {text} output: One of the categories {categories} or not_cyberbullying""" def process_text(text): # Format the input text prompt = format_prompt_inference(text, categories=bullyingCategories) # Tokenize the input input_ids = tokenizer(prompt, return_tensors="pt").to(model.device) # Generate a prediction with torch.no_grad(): outputs = model.generate( **input_ids, max_new_tokens=20, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id ) # Decode the output decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True) # Post-process the generated output to extract the classification predicted_output_raw = decoded_output.replace(prompt, "").strip() predicted_type = predicted_output_raw.split('\n')[0].strip() # Update the logic to determine if it's cyberbullying is_cyberbullying = predicted_type.lower().strip() in bullyingCategories return is_cyberbullying, predicted_type def main(): print("\nCyberbullying Text Analyzer") print("==========================") print("\nModel loaded and ready for analysis.") while True: print("\nEnter text to analyze (or 'q' to exit):") text = input().strip() if text.lower() == 'q': print("\nExiting program...") break if not text: print("Please enter some text.") continue print("\nAnalyzing...") is_cyberbullying, predicted_type = process_text(text) print("\n--- Analysis Result ---") print(f"cyberbullying: {is_cyberbullying}, type: {predicted_type}") if __name__ == "__main__": main() ``` Example: [watchguard](https://gitlab.com/manulthanura/watchguard) ## Limitations and Bias This model was trained on a specific dataset and may not generalize perfectly to all types of cyberbullying or different domains of text. Like all language models, it may reflect biases present in the training data. It's important to evaluate the model's performance on your specific use case and be aware of its potential limitations and biases.