|
|
--- |
|
|
datasets: |
|
|
- bbunzeck/babylm-german |
|
|
- bbunzeck/german-babylm-5m-subsets |
|
|
language: |
|
|
- de |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
This is a German BabyLM model trained on a [5M token subset](https://huggingface.co/datasets/bbunzeck/german-babylm-5m-subsets) of the [German BabyLM corpus](https://huggingface.co/datasets/bbunzeck/babylm-german). |
|
|
|
|
|
If you use this model, please cite the following publication: |
|
|
``` |
|
|
@inproceedings{bunzeck-etal-2025-construction, |
|
|
title = "Do Construction Distributions Shape Formal Language Learning In {G}erman {B}aby{LM}s?", |
|
|
author = "Bunzeck, Bastian and |
|
|
Duran, Daniel and |
|
|
Zarrie{\ss}, Sina", |
|
|
editor = "Boleda, Gemma and |
|
|
Roth, Michael", |
|
|
booktitle = "Proceedings of the 29th Conference on Computational Natural Language Learning", |
|
|
month = jul, |
|
|
year = "2025", |
|
|
address = "Vienna, Austria", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://aclanthology.org/2025.conll-1.12/", |
|
|
doi = "10.18653/v1/2025.conll-1.12", |
|
|
pages = "169--186", |
|
|
ISBN = "979-8-89176-271-8", |
|
|
} |
|
|
``` |