| | --- |
| | datasets: |
| | - nilq/babylm-10M |
| | language: |
| | - en |
| | --- |
| | |
| | This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data: |
| | - the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens) |
| | - the [**t**eenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more **t**okens of text (100M) |
| | - the [**w**eenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/**w**eights |
| | - the [**tw**eenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more **t**okens (the larger data set) and more **w**eights (*viz.* parameters) |
| |
|
| |
|
| | | | baby_llama | teenie_llama | weenie_llama | tweenie_llama | |
| | |-----------------|-----------|-------------|-------------|--------------| |
| | | Parameters | 2.97M | 2.97M | 11.44M | 11.44M | |
| | | hidden layers | 8 | 8 | 16 | 16 | |
| | | Attention heads | 8 | 8 | 16 | 16 | |
| | | Embedding size | 128 | 128 | 256 | 256 | |
| | | Context size | 128 | 128 | 256 | 256 | |
| | | Vocab size | 16k | 16k | 16k | 16k | |
| |
|
| |
|
| | If you use this model in your research, please cite the following publication: |
| |
|
| | ``` |
| | @inproceedings{bunzeck-zarriess-2024-fifty, |
| | title = "Fifty shapes of {BL}i{MP}: syntactic learning curves in language models are not uniform, but sometimes unruly", |
| | author = "Bunzeck, Bastian and |
| | Zarrie{\ss}, Sina", |
| | editor = "Qiu, Amy and |
| | Noble, Bill and |
| | Pagmar, David and |
| | Maraev, Vladislav and |
| | Ilinykh, Nikolai", |
| | booktitle = "Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning", |
| | month = oct, |
| | year = "2024", |
| | address = "Gothenburg, Sweden", |
| | publisher = "Association for Computational Linguistics", |
| | url = "https://aclanthology.org/2024.clasp-1.7", |
| | pages = "39--55", |
| | } |
| | ``` |