| | --- |
| | license: apache-2.0 |
| | base_model: roberta-gru |
| | tags: |
| | - vietnamese |
| | - hate-speech |
| | - span-detection |
| | - token-classification |
| | - nlp |
| | datasets: |
| | - visolex/ViHOS |
| | model-index: |
| | - name: roberta-gru-hsd-span |
| | results: |
| | - task: |
| | type: token-classification |
| | name: Hate Speech Span Detection |
| | dataset: |
| | name: visolex/ViHOS |
| | type: visolex/ViHOS |
| | metrics: |
| | - type: f1 |
| | value: 0.6326 |
| | - type: precision |
| | value: 0.6494 |
| | - type: recall |
| | value: 0.6305 |
| | - type: exact_match |
| | value: 0.0000 |
| | --- |
| | |
| | # roberta-gru-hsd-span: Hate Speech Span Detection (Vietnamese) |
| |
|
| | This model is a fine-tuned version of [roberta-gru](https://huggingface.co/roberta-gru) for Vietnamese **Hate Speech Span Detection**. |
| |
|
| | ## Model Details |
| |
|
| | - Base Model: `roberta-gru` |
| | - Description: Vietnamese Hate Speech Span Detection |
| | - Framework: HuggingFace Transformers |
| | - Task: Hate Speech Span Detection (token/char-level spans) |
| |
|
| | ### Hyperparameters |
| |
|
| | - Max sequence length: `64` |
| | - Learning rate: `5e-6` |
| | - Batch size: `32` |
| | - Epochs: `100` |
| | - Early stopping patience: `5` |
| |
|
| | ## Results |
| |
|
| | - F1: `0.6326` |
| | - Precision: `0.6494` |
| | - Recall: `0.6305` |
| | - Exact Match: `0.0000` |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForTokenClassification |
| | import torch |
| | |
| | model_name = "visolex/roberta-gru-hsd-span" |
| | tok = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForTokenClassification.from_pretrained(model_name) |
| | text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..." |
| | enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False) |
| | with torch.no_grad(): |
| | logits = model(**enc).logits |
| | pred_ids = logits.argmax(-1)[0].tolist() |
| | # TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset) |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache-2.0 |
| |
|
| | ## Acknowledgments |
| |
|
| | - Base model: [roberta-gru](https://huggingface.co/roberta-gru) |
| |
|