the_poli

the_poli is a transformer-based NLP classification model developed as part of the s0m3m0 research project.
The model is designed to analyse political and socio-political text, primarily from online and social media sources, and generate structured predictions for analytical and experimental purposes.

This repository contains only the trained model artifacts (weights, configuration, and tokenizer files).
The full data pipeline and application code are maintained separately.


Model Overview

  • Model type: Transformer-based text classification
  • Framework: Hugging Face Transformers
  • Primary language: English
  • Domain: Political and social media text
  • Use case: Research, analysis, and experimentation

The model is intended to assist in identifying patterns and signals in text rather than making authoritative judgments.


Intended Use

The model is suitable for:

  • Academic and research-based NLP experiments
  • Political and social discourse analysis
  • Text classification pipeline prototyping
  • Educational demonstrations of NLP systems

Not Intended For

  • Political persuasion or targeting
  • Surveillance or profiling of individuals
  • Automated decision-making in real-world political contexts
  • High-stakes or safety-critical applications

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "d42kw01f/the_poli"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "Example political statement for analysis"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)

Training Data

  • Trained on curated datasets derived from publicly available sources
  • Data was preprocessed and filtered for research purposes
  • No private, sensitive, or non-consensual data was intentionally included

Dataset details are intentionally limited to reduce misuse risk.


Limitations & Bias

  • Model performance depends on the quality and balance of the training data
  • May reflect biases present in source datasets
  • Not robust to domain shifts, sarcasm, or adversarial input
  • Outputs should be treated as probabilistic signals, not factual conclusions

Ethical Considerations

This model is released strictly for research and educational use. Users are responsible for:

  • Ensuring ethical deployment
  • Respecting platform terms of service
  • Avoiding harmful, misleading, or manipulative applications

Related Project


Author

Dakshitha Navodya Perera
AI • Cybersecurity • Data Engineering
Sri Lanka

Downloads last month
18
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support