CBDC-Discourse
CBDC-Discourse is a BERT-based sentence classifier fine-tuned to categorize central bank digital currency (CBDC) discourse into three conceptually distinct classes: Feature, Risk-Benefit, and Process.
This model enables structured analysis of CBDC-related policy and research texts by separating design attributes, evaluative outcomes, and procedural activities.
| Class | Description |
|---|---|
| Feature | A sentence that specifies a concrete design element or operational mechanism of CBDC. Examples include: wallet/card modality; programmability/smart contracts; privacy model; interoperability requirements; legal tender status; distribution via intermediaries; holding limits/caps; interest-bearing/remuneration (incl. negative rates); rulebook/scheme rules; settlement architecture (DLT/RPS/RTGS links). |
| Risk-Benefit | A sentence that asserts or implies outcomes, effects, or trade-offs (positive or negative) from a CBDC feature or its introduction, including policy/equilibrium impacts. Examples include: faster/cheaper/more transparent cross-border payments; financial inclusion; regional cooperation; competition/innovation; sovereignty/autonomy; efficiency/productivity gains. Also, negative concerns such as bank disintermediation; cyber/operational risk; crisis flight from deposits; privacy harms; monetary/fiscal dominance concerns; “too successful” crowd-out; legal/regulatory fragility. |
| Process | A sentence about research, consultations, pilots, governance, timeline, or agenda-setting, without specifying a concrete feature or claiming effects/trade-offs. Examples include: public consultations; surveys/focus groups; task forces; phases (investigation/preparation/pilot); rulebook drafting as an activity (absent specifics); reports/citations; statements of interest/attention; open questions; goal/timeline setting (e.g., “medium-term goal”). |
Base Model
This classifier is built on top of bilalzafar/CentralBank-BERT, a domain-adapted BERT model pretrained on over 2 million sentences (~66M tokens) from BIS central bank speeches (1996–2024).
CentralBank-BERT provides deep contextual understanding of monetary policy, financial regulation, and central banking discourse, making it an optimal foundation for downstream CBDC-related text classification.
Dataset
The model was fine-tuned on a manually annotated dataset of CBDC-related sentences extracted from Bank for International Settlements (BIS) central bank speeches (1996–2024). The dataset was balanced across three discourse classes with a total of 2,886 sentences (962 per class):
Intended Use
This model is designed for the automatic classification of CBDC discourse in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes procedural aspects, design features, or evaluative outcomes of central bank digital currencies. Such categorization supports policy analysis, thematic mapping of central bank communication, and structured NLP-based research in the fields of finance, monetary economics, and economic policy.
Training Details
- Tokenization: WordPiece (CentralBank-BERT tokenizer)
- Maximum sequence length: 256 tokens
- Dynamic padding (
DataCollatorWithPadding) - Train/Val/Test split: 80/10/10 stratified by label
| Parameter | Value |
|---|---|
| Base model | bilalzafar/CentralBank-BERT |
| Epochs | 6 |
| Train batch size (per device) | 8 |
| Eval batch size (per device) | 16 |
| Gradient accumulation | 2 |
| Effective batch size | 16 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 0.06 |
| Scheduler | Cosine |
| Mixed precision (fp16) | Enabled |
- Environment: Google Colab
- GPU: Tesla T4 (16GB)
- Framework: PyTorch 2.8.0 + Hugging Face Transformers
Evaluation Results
| Split | Accuracy | Macro-F1 | Weighted-F1 | Class | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| Validation | 0.851 | 0.839 | 0.852 | – | – | – | – |
| Test | 0.823 | 0.803 | 0.825 | Feature | 0.759 | 0.782 | 0.770 |
| Process | 0.927 | 0.845 | 0.884 | ||||
| Risk-Benefit | 0.700 | 0.817 | 0.754 |
Other CBDC Models
This model is part of the CentralBank-BERT / CBDC model family, a suite of domain-adapted classifiers for analyzing central-bank communication.
| Model | Purpose | Intended Use | Link |
|---|---|---|---|
| bilalzafar/CentralBank-BERT | Domain-adaptive masked LM trained on BIS speeches (1996–2024). | Base encoder for CBDC downstream tasks; fill-mask tasks. | CentralBank-BERT |
| bilalzafar/CBDC-BERT | Binary classifier: CBDC vs. Non-CBDC. | Flagging CBDC-related discourse in large corpora. | CBDC-BERT |
| bilalzafar/CBDC-Stance | 3-class stance model (Pro, Wait-and-See, Anti). | Research on policy stances and discourse monitoring. | CBDC-Stance |
| bilalzafar/CBDC-Sentiment | 3-class sentiment model (Positive, Neutral, Negative). | Tone analysis in central bank communications. | CBDC-Sentiment |
| bilalzafar/CBDC-Type | Classifies Retail, Wholesale, General CBDC mentions. | Distinguishing policy focus (retail vs wholesale). | CBDC-Type |
| bilalzafar/CBDC-Discourse | 3-class discourse classifier (Feature, Process, Risk-Benefit). | Structured categorization of CBDC communications. | CBDC-Discourse |
| bilalzafar/CentralBank-NER | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | CentralBank-NER |
Repository and Replication Package
All training pipelines, preprocessing scripts, evaluation notebooks, and result outputs are available in the companion GitHub repository:
🔗 https://github.com/bilalezafar/CentralBank-BERT
How to Use
from transformers import pipeline
# Load pipeline
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Discourse")
# Example sentences
sentences = [
"The central bank launched a pilot project for CBDC cross-border settlement.", # Process
"Programmability in CBDC allows conditional payments.", # Feature
"CBDC may increase risks of bank disintermediation." # Risk-Benefit
]
# Predict
for s in sentences:
result = classifier(s, return_all_scores=False)[0]
print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")
# Example output
# [{The central bank launched a pilot project for CBDC cross-border settlement. → Process (score=0.9989)}]
# [{Programmability in CBDC allows conditional payments. → Feature (score=0.9991)}]
# [{CBDC may increase risks of bank disintermediation. → Risk-Benefit (score=0.9986)}]
Citation
If you use this model, please cite as:
Zafar, M. B. (2025). CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse. SSRN. https://papers.ssrn.com/abstract=5404456
@article{zafar2025centralbankbert,
title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
author={Zafar, Muhammad Bilal},
year={2025},
journal={SSRN Electronic Journal},
url={https://papers.ssrn.com/abstract=5404456}
}
- Downloads last month
- 4
Model tree for bilalzafar/CBDC-Discourse
Base model
google-bert/bert-base-uncased