ModernBERT Dutch Base Wide
A ModernBERT model pretrained on Dutch mc4_nl_cleaned dataset. This model has 22 layers (like ModernBERT-base) but with wider hidden dimensions (1024 instead of 768), placing it between base and large in terms of parameters (230M).
Model Details
- Architecture: ModernBERT (Answer.AI/LightOn)
- Layers: 22
- Hidden size: 1024
- Attention heads: 16
- Intermediate size: 1536
- Vocab size: 32,128
- Parameters: 230M
- Tokenizer:
yhavinga/dutch-llama-tokenizer(SentencePiece, Dutch-optimized)
Training
- Dataset:
yhavinga/mc4_nl_cleaned(full config) - Steps: 2,000,000
- Batch size: 8 per device (multi-host TPU v4)
- Learning rate: 3e-5 with cosine decay to 1e-6
- Warmup steps: 20,000
- Weight decay: 0.01
- Sequence length: 1024
- Precision: bfloat16
Usage
from transformers import AutoTokenizer, ModernBertForMaskedLM
model = ModernBertForMaskedLM.from_pretrained("yhavinga/modernbert-dutch-base-wide")
tokenizer = AutoTokenizer.from_pretrained("yhavinga/modernbert-dutch-base-wide")
# Masked language modeling
inputs = tokenizer("Amsterdam is de<mask> van Nederland.", return_tensors="pt")
outputs = model(**inputs)
predictions = tokenizer.decode(outputs.logits[0, 4].topk(5).indices[0])
# Expected: "hoofdstad" (capital)
Citation
If you use this model, please cite:
@model{modernbert_dutch_base_wide,
title={ModernBERT Dutch Base Wide},
author={havinga, y havinga},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/yhavinga/modernbert-dutch-base-wide}
}
- Downloads last month
- 15