ModernBERT Dutch Base Wide

A ModernBERT model pretrained on Dutch mc4_nl_cleaned dataset. This model has 22 layers (like ModernBERT-base) but with wider hidden dimensions (1024 instead of 768), placing it between base and large in terms of parameters (230M).

Model Details

  • Architecture: ModernBERT (Answer.AI/LightOn)
  • Layers: 22
  • Hidden size: 1024
  • Attention heads: 16
  • Intermediate size: 1536
  • Vocab size: 32,128
  • Parameters: 230M
  • Tokenizer: yhavinga/dutch-llama-tokenizer (SentencePiece, Dutch-optimized)

Training

  • Dataset: yhavinga/mc4_nl_cleaned (full config)
  • Steps: 2,000,000
  • Batch size: 8 per device (multi-host TPU v4)
  • Learning rate: 3e-5 with cosine decay to 1e-6
  • Warmup steps: 20,000
  • Weight decay: 0.01
  • Sequence length: 1024
  • Precision: bfloat16

Usage

from transformers import AutoTokenizer, ModernBertForMaskedLM

model = ModernBertForMaskedLM.from_pretrained("yhavinga/modernbert-dutch-base-wide")
tokenizer = AutoTokenizer.from_pretrained("yhavinga/modernbert-dutch-base-wide")

# Masked language modeling
inputs = tokenizer("Amsterdam is de<mask> van Nederland.", return_tensors="pt")
outputs = model(**inputs)
predictions = tokenizer.decode(outputs.logits[0, 4].topk(5).indices[0])
# Expected: "hoofdstad" (capital)

Citation

If you use this model, please cite:

@model{modernbert_dutch_base_wide,
  title={ModernBERT Dutch Base Wide},
  author={havinga, y havinga},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/yhavinga/modernbert-dutch-base-wide}
}
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support