PhoBERT ESG Topic Classifier for Vietnamese Banking Annual Reports

Model description

This model is a Vietnamese text classification model fine-tuned from PhoBERT to classify sentences from banking annual reports into ESG-related topics. It is designed as Module 2 (ESG Topic Classification) in an ESG-washing analysis pipeline, where downstream modules assess actionability, evidence support, and report-level ESG-washing risk.

The model predicts one of six labels:

E (Environmental)
S_labor (Social – labor/workforce)
S_community (Social – community/CSR)
S_product (Social – product/customer)
G (Governance)
Non_ESG (not ESG-related)

Note: The model focuses on textual disclosure topic classification, not factual verification of ESG claims.

Intended use

Primary intended use

Filtering and categorizing ESG-related sentences in Vietnamese banking annual reports.
Supporting ESG-washing analysis pipelines (e.g., actionability classification and evidence linking).

Example downstream usage

Keep only ESG sentences (E, S_*, G) and discard Non_ESG for later actionability/evidence modules.
Aggregate predicted topics by bank-year to analyze disclosure patterns across ESG pillars.

Out-of-scope use

Determining whether a bank is actually “greenwashing/ESG-washing” in the real world.
Use on domains far from banking annual reports (e.g., social media) without re-validation.
Legal, compliance, or investment decision-making without human review.

Training data

The model was trained using a hybrid labeling strategy:

LLM pre-labels (teacher) to bootstrap semantic topic boundaries
Weak labeling rules (filter) to override trivial non-ESG content with high precision
A manually annotated gold set used for calibration and evaluation

Hybrid label sources:

llm: 2,897 samples (LLM-only)
llm_weak_agree: 2,083 samples (LLM + weak labels agree, higher confidence)

Total labeled samples for training/validation: 4,980

Train: 4,233
Validation: 747

Gold set (manual) for final test: 500 samples, balanced across labels.

Training procedure

Base model: PhoBERT fine-tuning with a 6-class classification head.
Objective: Cross-entropy loss (with class-balancing strategy).
Context-aware input: sentence-level classification with local context window available in the corpus (prev + sent + next) depending on block type.

Evaluation results

Validation set (747 samples)

Macro-F1: 0.8598
Micro-F1: 0.8635
Weighted-F1: 0.8628

Per-class (validation):

Label	Precision	Recall	F1	Support
E	0.8310	0.8806	0.8551	67
S_labor	0.9000	0.8675	0.8834	83
S_community	0.8732	0.8611	0.8671	72
S_product	0.8426	0.8922	0.8667	102
G	0.8372	0.7606	0.7970	142
Non_ESG	0.8785	0.9004	0.8893	281

Gold test set (500 samples)

Macro-F1: 0.9665
Micro-F1: 0.9660

Per-class (gold):

Label	Precision	Recall	F1	Support
E	0.9872	0.9625	0.9747	80
S_labor	0.9873	0.9750	0.9811	80
S_community	0.9634	0.9875	0.9753	80
S_product	0.9506	0.9625	0.9565	80
G	0.9659	0.9444	0.9551	90
Non_ESG	0.9457	0.9667	0.9560	90

Note: The gold test set is balanced and may not reflect real-world class frequencies in annual reports. Always validate on your target corpus.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"  # replace
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

labels = ["E", "S_labor", "S_community", "S_product", "G", "Non_ESG"]

text = "Ngân hàng đã triển khai chương trình giảm phát thải và tiết kiệm năng lượng trong năm 2024."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1).squeeze().tolist()

pred = labels[int(torch.tensor(probs).argmax())]
print(pred, max(probs))

Limitations

The model is trained on Vietnamese banking annual report language and structure; performance may degrade on other domains.

ESG boundaries can be ambiguous; some governance-related financial-risk text may be misclassified without domain adaptation.

The model does not verify the truthfulness of ESG claims; it only categorizes topics based on text.

Downloads last month: 1

Safetensors

Model size

0.4B params

Tensor type

F32