XLM-RoBERTa-base fine-tuned for Vietnamese NLI

A Vietnamese Natural Language Inference (NLI) model that predicts the relation between a premise and a hypothesis as one of:

  • c (contradiction)
  • n (neutral)
  • e (entailment)

This model fine-tunes xlm-roberta-base using a stratified 80/10/10 split, optimized to run on a single GPU (Kaggle T4/P100).


Model Details

  • Developed by: Lê Lý (MoMo Talent 2025)
  • Model type: XLM-RoBERTa encoder for sequence classification (3 labels)
  • Languages: Vietnamese (vi)
  • License: Inherits from upstream xlm-roberta-base (set the model page license accordingly)
  • Finetuned from: xlm-roberta-base

Model Sources

  • Base model: XLM-RoBERTa (Conneau et al., 2020)
  • Training script: Included below in this card (Kaggle-ready)

Uses

Direct Use

  • Vietnamese NLI inference for research, demos, or as a component in larger systems (e.g., retrieval/ranking, dialog consistency checks).

Downstream Use

  • Fine-tune further on domain-specific VN NLI or related tasks (stance detection, contradiction detection in QA/assistants).

Out-of-Scope Use

  • Non-VN text without adaptation.
  • Safety-critical decisions without human oversight.
  • Open-domain factual verification (this is NLI, not a fact-checker).

Bias, Risks, and Limitations

  • Trained on a VN NLI dataset; distributional shift (domain, register, slang, figurative language) may degrade performance.
  • NLI labels can be sensitive to annotation style/instructions; avoid over-interpreting borderline cases.

Recommendations: Evaluate on your target domain; monitor confusion between n vs e/c; consider calibration or thresholding if used in pipelines.


How to Get Started

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "YOUR_USERNAME/xlmr-vinli-finetune"  # replace with your repo id
tok = AutoTokenizer.from_pretrained(model_id)
mdl = AutoModelForSequenceClassification.from_pretrained(model_id)

id2label = mdl.config.id2label  # {0:'c',1:'n',2:'e'}
text = {"premise": "Trời đang mưa rất to.", "hypothesis": "Bên ngoài khô ráo và không có mưa."}

enc = tok(text["premise"], text["hypothesis"], return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = mdl(**enc).logits
pred = logits.softmax(-1).argmax(-1).item()
print("Prediction:", id2label[pred])

Training Details

Data

  • Path (Kaggle): /kaggle/input/nli-vietnam/full_data_true.json
  • Labels: {"c":0, "n":1, "e":2}
  • Split: Stratified ~80/10/10 (train/val/test)

Ensure JSON has fields: id, premise, hypothesis, label (labels in {c,n,e}).

Procedure

Preprocessing

  • Tokenizer: XLMRobertaTokenizerFast (max_length=256, truncation)

Hyperparameters

  • Epochs: 4
  • Optim: AdamW (via HF Trainer)
  • LR: 2e-5
  • Weight decay: 0.01
  • Warmup ratio: 0.06
  • Scheduler: linear
  • Batch: per_device_train_batch_size=8, per_device_eval_batch_size=32
  • Grad Accumulation: 2 (effective train batch ~16)
  • Precision: bf16 if available (Ampere+), else fp16
  • Label smoothing: 0.05
  • Early stopping: patience 2
  • Gradient checkpointing: enabled
  • save_safetensors=True, load_best_model_at_end=True on f1_macro

Compute

  • Hardware: Single NVIDIA T4/P100 16GB (Kaggle)
  • dataloader_num_workers=2, pin_memory=True

Speeds, Sizes, Times

  • Checkpoint size: standard xlm-roberta-base head (+classifier)
  • Exact wall-clock depends on GPU; typical Kaggle session completes within normal time limits.

Evaluation

Metrics & Factors

  • Metrics: Accuracy, Macro F1
  • Factors: Per-label performance (c, n, e)

Results (Test)

Accuracy: 0.9901
Macro F1: 0.9878
Support: 1113 samples (c=429, n=108, e=576)

Classification Report:

              precision    recall  f1-score   support

           c     0.9930    0.9883    0.9907       429
           n     0.9815    0.9815    0.9815       108
           e     0.9896    0.9931    0.9913       576

weighted avg     0.9901    0.9901    0.9901      1113

Confusion Matrix:

 [  1 106   1],
 [  2   2 572]]

Note: Replicate numbers may vary slightly due to randomness/hardware.

Environmental Impact

  • Hardware: Single T4/P100 16GB (Kaggle)
  • Cloud Provider/Region: Kaggle (unspecified)
  • Hours used: Not logged
  • Carbon Emitted: Not estimated

Technical Specifications

Architecture & Objective

  • Backbone: XLM-RoBERTa Base
  • Head: Linear classification (3 labels)
  • Objective: Cross-entropy with label smoothing (0.05); optional class weighting (off by default)

Software

  • transformers==4.43.3
  • datasets==2.21.0
  • accelerate==0.33.0
  • evaluate==0.4.2
  • scikit-learn==1.5.1
  • torch (CUDA)

Citation

XLM-RoBERTa

@inproceedings{conneau2020unsupervised,
  title={Unsupervised Cross-lingual Representation Learning at Scale},
  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
  booktitle={ACL},
  year={2020}
}

Contact

Author: Lê Lý

Downloads last month
7
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support