Model Card for ThunderBERT-de-v1

Model Summary

ThunderBERT-de-v1 is a BERT-based language model for multi-class classification of German-language texts that describe thunderstorm events.
Its purpose is to automatically detect and differentiate between varying intensities of thunderstorm-related content in historical and contemporary German sources.
The model was developed and trained during Phase B of the Thunderstorm Project at the University of Freiburg.


Model Details

Model Description

  • Model name: ThunderBERT-de-v1
  • Developed by: Franck Schätz, University of Freiburg
  • Model type: Encoder-based transformer for multi-class sequence classification
  • Language(s): German (de)
  • License: Apache-2.0
  • Fine-tuned from: bert-base-german-cased
  • Number of labels: 4
  • Labels:
    • GW0: no thunderstorm
    • GW1: light thunderstorm
    • GW2: strong thunderstorm
    • GW3: severe thunderstorm
  • Intended granularity: Document- or passage-level classification

The model uses a SentencePiece-based tokenizer configuration (spm.model, tokenizer_config.json, etc.).
For compatibility with the training setup, it may be necessary to load the tokenizer with use_fast=False.

Model Sources

  • Repository: https://huggingface.co/<namespace>/ThunderBERT-de-v1
    (replace <namespace> with your actual HF username or organization)
  • Paper: (to be added if/when available)
  • Demo: (optional, e.g. Spaces link if created)

Uses

Direct Use

This model is designed for automatic classification of thunderstorm intensities in German-language texts.
It assigns each input text (sentence, passage, or short paragraph) to one of four categories:

Label Description
GW0 no thunderstorm
GW1 light thunderstorm
GW2 strong thunderstorm
GW3 severe thunderstorm

Typical applications include:

  • Identifying and categorizing thunderstorm mentions in:
    • historical weather chronicles,
    • meteorological observations,
    • diaries, letters, or reports describing severe weather.
  • Supporting large-scale extraction of climatological events for environmental or historical studies.

Downstream Use

Can be integrated into:

  • event extraction or climate knowledge graph pipelines,
  • cross-referencing with temporal and geographic entities (e.g. for spatio-temporal storm mapping),
  • semi-automated annotation or corpus enrichment.

Out-of-Scope Use

Not intended for:

  • Operational forecasting or nowcasting of thunderstorms,
  • Real-time meteorological decision-making,
  • Safety-critical or legal decision contexts,
  • General-purpose sentiment or topic classification.

Bias, Risks, and Limitations

Limitations

  • Domain specificity: Performance is optimized for historical and descriptive sources.
    Results may degrade for short, modern, informal, or technical texts.
  • Language coverage: Only German is supported.
  • Label granularity: The model captures thunderstorm intensity levels but not duration, geographic location, or damage.
  • Data transparency: The dataset is internal to the project and was annotated by domain experts.

Potential Biases

  • Possible overrepresentation of specific regions, historical periods, or linguistic styles.
  • Ambiguities in wording (e.g., metaphorical "storm") can cause misclassification.
  • OCR errors or historical orthography can reduce recall.

Recommendations

  • Use as a research and discovery tool, not as an operational detector.
  • Combine automated classification with manual validation or expert review.
  • For other domains, additional fine-tuning is recommended.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "<namespace>/ThunderBERT-de-v1"  # e.g. "Stickmu/ThunderBERT-de-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "Am Abend entluden sich schwere Gewitter über dem Schwarzwald."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

predicted_class_id = int(outputs.logits.argmax(dim=-1))
print(model.config.id2label[predicted_class_id])
Downloads last month
20
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Stickmu/ThunderstormBERT-de-v1

Finetuned
(233)
this model