German BabyLM
Collection
Data that can be used for developing developmentally plausible language models in German.
•
18 items
•
Updated
•
2
This is a German BabyLM model trained on a 5M token subset of the German BabyLM corpus.
If you use this model, please cite the following publication:
@inproceedings{bunzeck-etal-2025-construction,
title = "Do Construction Distributions Shape Formal Language Learning In {G}erman {B}aby{LM}s?",
author = "Bunzeck, Bastian and
Duran, Daniel and
Zarrie{\ss}, Sina",
editor = "Boleda, Gemma and
Roth, Michael",
booktitle = "Proceedings of the 29th Conference on Computational Natural Language Learning",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.conll-1.12/",
doi = "10.18653/v1/2025.conll-1.12",
pages = "169--186",
ISBN = "979-8-89176-271-8",
}