Model Card for ThunderBERT-de-v1

Model Summary

ThunderBERT-de-v1 is a BERT-based language model for multi-class classification of German-language texts that describe thunderstorm events.
Its purpose is to automatically detect and differentiate between varying intensities of thunderstorm-related content in historical and contemporary German sources.
The model was developed and trained during Phase B of the Thunderstorm Project at the University of Freiburg.

Model Details

Model Description

Model name: ThunderBERT-de-v1
Developed by: Franck Schätz, University of Freiburg
Model type: Encoder-based transformer for multi-class sequence classification
Language(s): German (de)
License: Apache-2.0
Fine-tuned from: bert-base-german-cased
Number of labels: 4
Labels:
- GW0: no thunderstorm
- GW1: light thunderstorm
- GW2: strong thunderstorm
- GW3: severe thunderstorm
Intended granularity: Document- or passage-level classification

The model uses a SentencePiece-based tokenizer configuration (spm.model, tokenizer_config.json, etc.).
For compatibility with the training setup, it may be necessary to load the tokenizer with use_fast=False.

Model Sources

Repository: https://huggingface.co/<namespace>/ThunderBERT-de-v1
(replace <namespace> with your actual HF username or organization)
Paper: (to be added if/when available)
Demo: (optional, e.g. Spaces link if created)

Uses

Direct Use

This model is designed for automatic classification of thunderstorm intensities in German-language texts.
It assigns each input text (sentence, passage, or short paragraph) to one of four categories:

Label	Description
GW0	no thunderstorm
GW1	light thunderstorm
GW2	strong thunderstorm
GW3	severe thunderstorm

Typical applications include:

Identifying and categorizing thunderstorm mentions in:
- historical weather chronicles,
- meteorological observations,
- diaries, letters, or reports describing severe weather.
Supporting large-scale extraction of climatological events for environmental or historical studies.

Downstream Use

Can be integrated into:

event extraction or climate knowledge graph pipelines,
cross-referencing with temporal and geographic entities (e.g. for spatio-temporal storm mapping),
semi-automated annotation or corpus enrichment.

Out-of-Scope Use

Not intended for:

Operational forecasting or nowcasting of thunderstorms,
Real-time meteorological decision-making,
Safety-critical or legal decision contexts,
General-purpose sentiment or topic classification.

Bias, Risks, and Limitations

Limitations

Domain specificity: Performance is optimized for historical and descriptive sources.
Results may degrade for short, modern, informal, or technical texts.
Language coverage: Only German is supported.
Label granularity: The model captures thunderstorm intensity levels but not duration, geographic location, or damage.
Data transparency: The dataset is internal to the project and was annotated by domain experts.

Potential Biases

Possible overrepresentation of specific regions, historical periods, or linguistic styles.
Ambiguities in wording (e.g., metaphorical "storm") can cause misclassification.
OCR errors or historical orthography can reduce recall.

Recommendations

Use as a research and discovery tool, not as an operational detector.
Combine automated classification with manual validation or expert review.
For other domains, additional fine-tuning is recommended.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "<namespace>/ThunderBERT-de-v1"  # e.g. "Stickmu/ThunderBERT-de-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "Am Abend entluden sich schwere Gewitter über dem Schwarzwald."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

predicted_class_id = int(outputs.logits.argmax(dim=-1))
print(model.config.id2label[predicted_class_id])

Downloads last month: 20

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for Stickmu/ThunderstormBERT-de-v1

Base model

microsoft/mdeberta-v3-base

Finetuned

(233)

this model