| | --- |
| | language: |
| | - en |
| | tags: |
| | - llama |
| | - llama-3 |
| | - lora |
| | - content-moderation |
| | - uncensored |
| | - text-generation |
| | license: mit |
| | base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
| | --- |
| | |
| | # Llama 3.1 Censorship LoRAs |
| |
|
| | This repository contains LoRA adapters for Meta's Llama 3.1 8B Instruct model, designed for censoring and uncensoring text content. |
| |
|
| | ## What are these LoRA adapters? |
| |
|
| | These LoRA adapters serve as fine-tuning tools for the Llama 3.1 model. They capture the differences between the original, more cautious Llama 3.1 and a version that has been adjusted to be less restrictive, [agentlans/Llama3.1-vodka](https://huggingface.co/agentlans/Llama3.1-vodka). These adapters adjust how the model handles potentially sensitive content. |
| |
|
| | ### The Basics |
| |
|
| | - **Base Model**: Llama 3.1 Instruct 8B |
| | - **Comparison Model**: [agentlans/Llama3.1-vodka](https://huggingface.co/agentlans/Llama3.1-vodka) |
| | - **Extraction Method**: LoRA (Low-Rank Adaptation) |
| |
|
| | ### Adapter Options |
| |
|
| | Different "strengths" of adaptation are available: 2, 4, 8, 16, 32, and 64. These can be thought of as dials for determining the extent of changes to the model's behaviour. |
| |
|
| | ### Applications |
| |
|
| | - Customizing Llama 3.1 for specific content needs |
| | - Adjusting the model's behaviour to align more closely with the censored or uncensored variant |
| | - Experimenting with various settings to identify the most effective configuration |
| |
|
| | ### Tips for Use |
| |
|
| | - Starting with lower ranks (2, 4, 8) allows for more subtle changes |
| | - Higher ranks (32, 64) enable larger adjustments but require more computational resources to apply to the model |
| | - Use the lowest rank that achieves the desired effect |
| | - For best results, use system prompts in conjunction with the LoRAs |
| | - Always use these adapters responsibly and ethically |
| |
|
| | ## Uses and Limitations |
| |
|
| | ### The Censor-LoRA |
| |
|
| | Designed for: |
| | - Maintaining family-friendly content |
| | - Removing explicit language |
| | - General content moderation |
| |
|
| | ### The Uncensor-LoRA |
| |
|
| | Intended for: |
| | - Restoring text that may have been excessively censored |
| | - Creative writing in more mature contexts |
| | - Generating realistic dialogue for adult-oriented content |
| |
|
| | ### Limitations |
| |
|
| | - These adapters may occasionally over-censor or under-censor content |
| | - They should not be the sole method for content moderation; human oversight remains crucial |
| | - The uncensoring adapter has the potential to generate inappropriate content, necessitating careful use |
| |
|
| | ## Ethical Considerations |
| |
|
| | The use of these adapters raises several ethical concerns: |
| |
|
| | - The censoring adapter may inadvertently suppress legitimate speech or artistic expression |
| | - The uncensoring adapter could be misused to produce harmful or offensive content |
| | - Both adapters may reflect and potentially amplify societal biases present in the training data |
| |
|
| | Careful consideration of the implications of deploying these models is necessary, along with the implementation of appropriate safeguards to ensure responsible usage. |