Model Card for ThunderBERT-de-v1
Model Summary
ThunderBERT-de-v1 is a BERT-based language model for multi-class classification of German-language texts that describe thunderstorm events.
Its purpose is to automatically detect and differentiate between varying intensities of thunderstorm-related content in historical and contemporary German sources.
The model was developed and trained during Phase B of the Thunderstorm Project at the University of Freiburg.
Model Details
Model Description
- Model name: ThunderBERT-de-v1
- Developed by: Franck Schätz, University of Freiburg
- Model type: Encoder-based transformer for multi-class sequence classification
- Language(s): German (
de) - License: Apache-2.0
- Fine-tuned from:
bert-base-german-cased - Number of labels: 4
- Labels:
GW0: no thunderstormGW1: light thunderstormGW2: strong thunderstormGW3: severe thunderstorm
- Intended granularity: Document- or passage-level classification
The model uses a SentencePiece-based tokenizer configuration (spm.model, tokenizer_config.json, etc.).
For compatibility with the training setup, it may be necessary to load the tokenizer with use_fast=False.
Model Sources
- Repository:
https://huggingface.co/<namespace>/ThunderBERT-de-v1
(replace<namespace>with your actual HF username or organization) - Paper: (to be added if/when available)
- Demo: (optional, e.g. Spaces link if created)
Uses
Direct Use
This model is designed for automatic classification of thunderstorm intensities in German-language texts.
It assigns each input text (sentence, passage, or short paragraph) to one of four categories:
| Label | Description |
|---|---|
| GW0 | no thunderstorm |
| GW1 | light thunderstorm |
| GW2 | strong thunderstorm |
| GW3 | severe thunderstorm |
Typical applications include:
- Identifying and categorizing thunderstorm mentions in:
- historical weather chronicles,
- meteorological observations,
- diaries, letters, or reports describing severe weather.
- Supporting large-scale extraction of climatological events for environmental or historical studies.
Downstream Use
Can be integrated into:
- event extraction or climate knowledge graph pipelines,
- cross-referencing with temporal and geographic entities (e.g. for spatio-temporal storm mapping),
- semi-automated annotation or corpus enrichment.
Out-of-Scope Use
Not intended for:
- Operational forecasting or nowcasting of thunderstorms,
- Real-time meteorological decision-making,
- Safety-critical or legal decision contexts,
- General-purpose sentiment or topic classification.
Bias, Risks, and Limitations
Limitations
- Domain specificity: Performance is optimized for historical and descriptive sources.
Results may degrade for short, modern, informal, or technical texts. - Language coverage: Only German is supported.
- Label granularity: The model captures thunderstorm intensity levels but not duration, geographic location, or damage.
- Data transparency: The dataset is internal to the project and was annotated by domain experts.
Potential Biases
- Possible overrepresentation of specific regions, historical periods, or linguistic styles.
- Ambiguities in wording (e.g., metaphorical "storm") can cause misclassification.
- OCR errors or historical orthography can reduce recall.
Recommendations
- Use as a research and discovery tool, not as an operational detector.
- Combine automated classification with manual validation or expert review.
- For other domains, additional fine-tuning is recommended.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_id = "<namespace>/ThunderBERT-de-v1" # e.g. "Stickmu/ThunderBERT-de-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
text = "Am Abend entluden sich schwere Gewitter über dem Schwarzwald."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predicted_class_id = int(outputs.logits.argmax(dim=-1))
print(model.config.id2label[predicted_class_id])
- Downloads last month
- 20
Model tree for Stickmu/ThunderstormBERT-de-v1
Base model
microsoft/mdeberta-v3-base