PhoBERT ESG Topic Classifier for Vietnamese Banking Annual Reports
Model description
This model is a Vietnamese text classification model fine-tuned from PhoBERT to classify sentences from banking annual reports into ESG-related topics. It is designed as Module 2 (ESG Topic Classification) in an ESG-washing analysis pipeline, where downstream modules assess actionability, evidence support, and report-level ESG-washing risk.
The model predicts one of six labels:
E(Environmental)S_labor(Social – labor/workforce)S_community(Social – community/CSR)S_product(Social – product/customer)G(Governance)Non_ESG(not ESG-related)
Note: The model focuses on textual disclosure topic classification, not factual verification of ESG claims.
Intended use
Primary intended use
- Filtering and categorizing ESG-related sentences in Vietnamese banking annual reports.
- Supporting ESG-washing analysis pipelines (e.g., actionability classification and evidence linking).
Example downstream usage
- Keep only ESG sentences (
E,S_*,G) and discardNon_ESGfor later actionability/evidence modules. - Aggregate predicted topics by bank-year to analyze disclosure patterns across ESG pillars.
Out-of-scope use
- Determining whether a bank is actually “greenwashing/ESG-washing” in the real world.
- Use on domains far from banking annual reports (e.g., social media) without re-validation.
- Legal, compliance, or investment decision-making without human review.
Training data
The model was trained using a hybrid labeling strategy:
- LLM pre-labels (teacher) to bootstrap semantic topic boundaries
- Weak labeling rules (filter) to override trivial non-ESG content with high precision
- A manually annotated gold set used for calibration and evaluation
Hybrid label sources:
llm: 2,897 samples (LLM-only)llm_weak_agree: 2,083 samples (LLM + weak labels agree, higher confidence)
Total labeled samples for training/validation: 4,980
- Train: 4,233
- Validation: 747
Gold set (manual) for final test: 500 samples, balanced across labels.
Training procedure
- Base model: PhoBERT fine-tuning with a 6-class classification head.
- Objective: Cross-entropy loss (with class-balancing strategy).
- Context-aware input: sentence-level classification with local context window available in the corpus (
prev + sent + next) depending on block type.
Evaluation results
Validation set (747 samples)
- Macro-F1: 0.8598
- Micro-F1: 0.8635
- Weighted-F1: 0.8628
Per-class (validation):
| Label | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| E | 0.8310 | 0.8806 | 0.8551 | 67 |
| S_labor | 0.9000 | 0.8675 | 0.8834 | 83 |
| S_community | 0.8732 | 0.8611 | 0.8671 | 72 |
| S_product | 0.8426 | 0.8922 | 0.8667 | 102 |
| G | 0.8372 | 0.7606 | 0.7970 | 142 |
| Non_ESG | 0.8785 | 0.9004 | 0.8893 | 281 |
Gold test set (500 samples)
- Macro-F1: 0.9665
- Micro-F1: 0.9660
Per-class (gold):
| Label | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| E | 0.9872 | 0.9625 | 0.9747 | 80 |
| S_labor | 0.9873 | 0.9750 | 0.9811 | 80 |
| S_community | 0.9634 | 0.9875 | 0.9753 | 80 |
| S_product | 0.9506 | 0.9625 | 0.9565 | 80 |
| G | 0.9659 | 0.9444 | 0.9551 | 90 |
| Non_ESG | 0.9457 | 0.9667 | 0.9560 | 90 |
Note: The gold test set is balanced and may not reflect real-world class frequencies in annual reports. Always validate on your target corpus.
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "YOUR_USERNAME/YOUR_MODEL_REPO" # replace
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
labels = ["E", "S_labor", "S_community", "S_product", "G", "Non_ESG"]
text = "Ngân hàng đã triển khai chương trình giảm phát thải và tiết kiệm năng lượng trong năm 2024."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).squeeze().tolist()
pred = labels[int(torch.tensor(probs).argmax())]
print(pred, max(probs))
Limitations
The model is trained on Vietnamese banking annual report language and structure; performance may degrade on other domains.
ESG boundaries can be ambiguous; some governance-related financial-risk text may be misclassified without domain adaptation.
The model does not verify the truthfulness of ESG claims; it only categorizes topics based on text.
- Downloads last month
- 1