XLM-RoBERTa-base fine-tuned for Vietnamese NLI

A Vietnamese Natural Language Inference (NLI) model that predicts the relation between a premise and a hypothesis as one of:

c (contradiction)
n (neutral)
e (entailment)

This model fine-tunes xlm-roberta-base using a stratified 80/10/10 split, optimized to run on a single GPU (Kaggle T4/P100).

Model Details

Developed by: Lê Lý (MoMo Talent 2025)
Model type: XLM-RoBERTa encoder for sequence classification (3 labels)
Languages: Vietnamese (vi)
License: Inherits from upstream xlm-roberta-base (set the model page license accordingly)
Finetuned from: xlm-roberta-base

Model Sources

Base model: XLM-RoBERTa (Conneau et al., 2020)
Training script: Included below in this card (Kaggle-ready)

Uses

Direct Use

Vietnamese NLI inference for research, demos, or as a component in larger systems (e.g., retrieval/ranking, dialog consistency checks).

Downstream Use

Fine-tune further on domain-specific VN NLI or related tasks (stance detection, contradiction detection in QA/assistants).

Out-of-Scope Use

Non-VN text without adaptation.
Safety-critical decisions without human oversight.
Open-domain factual verification (this is NLI, not a fact-checker).

Bias, Risks, and Limitations

Trained on a VN NLI dataset; distributional shift (domain, register, slang, figurative language) may degrade performance.
NLI labels can be sensitive to annotation style/instructions; avoid over-interpreting borderline cases.

Recommendations: Evaluate on your target domain; monitor confusion between n vs e/c; consider calibration or thresholding if used in pipelines.

How to Get Started

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "YOUR_USERNAME/xlmr-vinli-finetune"  # replace with your repo id
tok = AutoTokenizer.from_pretrained(model_id)
mdl = AutoModelForSequenceClassification.from_pretrained(model_id)

id2label = mdl.config.id2label  # {0:'c',1:'n',2:'e'}
text = {"premise": "Trời đang mưa rất to.", "hypothesis": "Bên ngoài khô ráo và không có mưa."}

enc = tok(text["premise"], text["hypothesis"], return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = mdl(**enc).logits
pred = logits.softmax(-1).argmax(-1).item()
print("Prediction:", id2label[pred])

Training Details

Data

Path (Kaggle): /kaggle/input/nli-vietnam/full_data_true.json
Labels: {"c":0, "n":1, "e":2}
Split: Stratified ~80/10/10 (train/val/test)

Ensure JSON has fields: id, premise, hypothesis, label (labels in {c,n,e}).

Procedure

Preprocessing

Tokenizer: XLMRobertaTokenizerFast (max_length=256, truncation)

Hyperparameters

Epochs: 4
Optim: AdamW (via HF Trainer)
LR: 2e-5
Weight decay: 0.01
Warmup ratio: 0.06
Scheduler: linear
Batch: per_device_train_batch_size=8, per_device_eval_batch_size=32
Grad Accumulation: 2 (effective train batch ~16)
Precision: bf16 if available (Ampere+), else fp16
Label smoothing: 0.05
Early stopping: patience 2
Gradient checkpointing: enabled
save_safetensors=True, load_best_model_at_end=True on f1_macro

Compute

Hardware: Single NVIDIA T4/P100 16GB (Kaggle)
dataloader_num_workers=2, pin_memory=True

Speeds, Sizes, Times

Checkpoint size: standard xlm-roberta-base head (+classifier)
Exact wall-clock depends on GPU; typical Kaggle session completes within normal time limits.

Evaluation

Metrics & Factors

Metrics: Accuracy, Macro F1
Factors: Per-label performance (c, n, e)

Results (Test)

Accuracy: 0.9901
Macro F1: 0.9878
Support: 1113 samples (c=429, n=108, e=576)

Classification Report:

              precision    recall  f1-score   support

           c     0.9930    0.9883    0.9907       429
           n     0.9815    0.9815    0.9815       108
           e     0.9896    0.9931    0.9913       576

weighted avg     0.9901    0.9901    0.9901      1113

Confusion Matrix:

 [  1 106   1],
 [  2   2 572]]

Note: Replicate numbers may vary slightly due to randomness/hardware.

Environmental Impact

Hardware: Single T4/P100 16GB (Kaggle)
Cloud Provider/Region: Kaggle (unspecified)
Hours used: Not logged
Carbon Emitted: Not estimated
- You can estimate with the MLCO2 Impact calculator.

Technical Specifications

Architecture & Objective

Backbone: XLM-RoBERTa Base
Head: Linear classification (3 labels)
Objective: Cross-entropy with label smoothing (0.05); optional class weighting (off by default)

Software

transformers==4.43.3
datasets==2.21.0
accelerate==0.33.0
evaluate==0.4.2
scikit-learn==1.5.1
torch (CUDA)

Citation

XLM-RoBERTa

@inproceedings{conneau2020unsupervised,
  title={Unsupervised Cross-lingual Representation Learning at Scale},
  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
  booktitle={ACL},
  year={2020}
}

Contact

Author: Lê Lý

Downloads last month: 7

Safetensors

Model size

0.3B params

Tensor type

F32