PhoBERT ESG Topic Classifier for Vietnamese Banking Annual Reports

Model description

This model is a Vietnamese text classification model fine-tuned from PhoBERT to classify sentences from banking annual reports into ESG-related topics. It is designed as Module 2 (ESG Topic Classification) in an ESG-washing analysis pipeline, where downstream modules assess actionability, evidence support, and report-level ESG-washing risk.

The model predicts one of six labels:

  • E (Environmental)
  • S_labor (Social – labor/workforce)
  • S_community (Social – community/CSR)
  • S_product (Social – product/customer)
  • G (Governance)
  • Non_ESG (not ESG-related)

Note: The model focuses on textual disclosure topic classification, not factual verification of ESG claims.


Intended use

Primary intended use

  • Filtering and categorizing ESG-related sentences in Vietnamese banking annual reports.
  • Supporting ESG-washing analysis pipelines (e.g., actionability classification and evidence linking).

Example downstream usage

  • Keep only ESG sentences (E, S_*, G) and discard Non_ESG for later actionability/evidence modules.
  • Aggregate predicted topics by bank-year to analyze disclosure patterns across ESG pillars.

Out-of-scope use

  • Determining whether a bank is actually “greenwashing/ESG-washing” in the real world.
  • Use on domains far from banking annual reports (e.g., social media) without re-validation.
  • Legal, compliance, or investment decision-making without human review.

Training data

The model was trained using a hybrid labeling strategy:

  • LLM pre-labels (teacher) to bootstrap semantic topic boundaries
  • Weak labeling rules (filter) to override trivial non-ESG content with high precision
  • A manually annotated gold set used for calibration and evaluation

Hybrid label sources:

  • llm: 2,897 samples (LLM-only)
  • llm_weak_agree: 2,083 samples (LLM + weak labels agree, higher confidence)

Total labeled samples for training/validation: 4,980

  • Train: 4,233
  • Validation: 747

Gold set (manual) for final test: 500 samples, balanced across labels.


Training procedure

  • Base model: PhoBERT fine-tuning with a 6-class classification head.
  • Objective: Cross-entropy loss (with class-balancing strategy).
  • Context-aware input: sentence-level classification with local context window available in the corpus (prev + sent + next) depending on block type.

Evaluation results

Validation set (747 samples)

  • Macro-F1: 0.8598
  • Micro-F1: 0.8635
  • Weighted-F1: 0.8628

Per-class (validation):

Label Precision Recall F1 Support
E 0.8310 0.8806 0.8551 67
S_labor 0.9000 0.8675 0.8834 83
S_community 0.8732 0.8611 0.8671 72
S_product 0.8426 0.8922 0.8667 102
G 0.8372 0.7606 0.7970 142
Non_ESG 0.8785 0.9004 0.8893 281

Gold test set (500 samples)

  • Macro-F1: 0.9665
  • Micro-F1: 0.9660

Per-class (gold):

Label Precision Recall F1 Support
E 0.9872 0.9625 0.9747 80
S_labor 0.9873 0.9750 0.9811 80
S_community 0.9634 0.9875 0.9753 80
S_product 0.9506 0.9625 0.9565 80
G 0.9659 0.9444 0.9551 90
Non_ESG 0.9457 0.9667 0.9560 90

Note: The gold test set is balanced and may not reflect real-world class frequencies in annual reports. Always validate on your target corpus.


How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"  # replace
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

labels = ["E", "S_labor", "S_community", "S_product", "G", "Non_ESG"]

text = "Ngân hàng đã triển khai chương trình giảm phát thải và tiết kiệm năng lượng trong năm 2024."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1).squeeze().tolist()

pred = labels[int(torch.tensor(probs).argmax())]
print(pred, max(probs))

Limitations

The model is trained on Vietnamese banking annual report language and structure; performance may degrade on other domains.

ESG boundaries can be ambiguous; some governance-related financial-risk text may be misclassified without domain adaptation.

The model does not verify the truthfulness of ESG claims; it only categorizes topics based on text.


Downloads last month
1
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support