Add Model Card details
Browse files
README.md
CHANGED
|
@@ -1,3 +1,343 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- CohereForAI/xP3x
|
| 5 |
+
- CohereForAI/aya_dataset
|
| 6 |
+
- CohereForAI/aya_collection
|
| 7 |
+
- DataProvenanceInitiative/Commercially-Verified-Licenses
|
| 8 |
+
- CohereForAI/aya_evaluation_suite
|
| 9 |
+
language:
|
| 10 |
+
- afr
|
| 11 |
+
- amh
|
| 12 |
+
- ara
|
| 13 |
+
- aze
|
| 14 |
+
- bel
|
| 15 |
+
- ben
|
| 16 |
+
- bul
|
| 17 |
+
- cat
|
| 18 |
+
- ceb
|
| 19 |
+
- ces
|
| 20 |
+
- cym
|
| 21 |
+
- dan
|
| 22 |
+
- deu
|
| 23 |
+
- ell
|
| 24 |
+
- eng
|
| 25 |
+
- epo
|
| 26 |
+
- est
|
| 27 |
+
- eus
|
| 28 |
+
- fin
|
| 29 |
+
- fil
|
| 30 |
+
- fra
|
| 31 |
+
- fry
|
| 32 |
+
- gla
|
| 33 |
+
- gle
|
| 34 |
+
- glg
|
| 35 |
+
- guj
|
| 36 |
+
- hat
|
| 37 |
+
- hau
|
| 38 |
+
- heb
|
| 39 |
+
- hin
|
| 40 |
+
- hun
|
| 41 |
+
- hye
|
| 42 |
+
- ibo
|
| 43 |
+
- ind
|
| 44 |
+
- isl
|
| 45 |
+
- ita
|
| 46 |
+
- jav
|
| 47 |
+
- jpn
|
| 48 |
+
- kan
|
| 49 |
+
- kat
|
| 50 |
+
- kaz
|
| 51 |
+
- khm
|
| 52 |
+
- kir
|
| 53 |
+
- kor
|
| 54 |
+
- kur
|
| 55 |
+
- lao
|
| 56 |
+
- lav
|
| 57 |
+
- lat
|
| 58 |
+
- lit
|
| 59 |
+
- ltz
|
| 60 |
+
- mal
|
| 61 |
+
- mar
|
| 62 |
+
- mkd
|
| 63 |
+
- mlg
|
| 64 |
+
- mlt
|
| 65 |
+
- mon
|
| 66 |
+
- mri
|
| 67 |
+
- msa
|
| 68 |
+
- mya
|
| 69 |
+
- nep
|
| 70 |
+
- nld
|
| 71 |
+
- nor
|
| 72 |
+
- nso
|
| 73 |
+
- nya
|
| 74 |
+
- ory
|
| 75 |
+
- pan
|
| 76 |
+
- pes
|
| 77 |
+
- pol
|
| 78 |
+
- por
|
| 79 |
+
- pus
|
| 80 |
+
- ron
|
| 81 |
+
- rus
|
| 82 |
+
- sin
|
| 83 |
+
- slk
|
| 84 |
+
- slv
|
| 85 |
+
- smo
|
| 86 |
+
- sna
|
| 87 |
+
- snd
|
| 88 |
+
- som
|
| 89 |
+
- sot
|
| 90 |
+
- spa
|
| 91 |
+
- sqi
|
| 92 |
+
- srp
|
| 93 |
+
- sun
|
| 94 |
+
- swa
|
| 95 |
+
- swe
|
| 96 |
+
- tam
|
| 97 |
+
- tel
|
| 98 |
+
- tgk
|
| 99 |
+
- tha
|
| 100 |
+
- tur
|
| 101 |
+
- twi
|
| 102 |
+
- ukr
|
| 103 |
+
- urd
|
| 104 |
+
- uzb
|
| 105 |
+
- vie
|
| 106 |
+
- xho
|
| 107 |
+
- yid
|
| 108 |
+
- yor
|
| 109 |
+
- zho
|
| 110 |
+
- zul
|
| 111 |
+
metrics:
|
| 112 |
+
- accuracy
|
| 113 |
+
- bleu
|
| 114 |
---
|
| 115 |
+
|
| 116 |
+
<img src="aya-fig1.png" alt="Aya model summary image" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
| 117 |
+
|
| 118 |
+
# Model Card for Aya Model
|
| 119 |
+
|
| 120 |
+
## Model Summary
|
| 121 |
+
|
| 122 |
+
> The Aya model is a massively multilingual generative language model that follows instructions in 101 languages.
|
| 123 |
+
> Aya outperforms [mT0](https://huggingface.co/bigscience/mt0-xxl) and [BLOOMZ](https://huggingface.co/bigscience/bloomz) a wide variety of automatic and human evaluations despite covering double the number of languages.
|
| 124 |
+
> The Aya model is trained using [xP3x](https://huggingface.co/datasets/CohereForAI/xP3x), [Aya Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset), [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection), a subset of [DataProvenance collection](https://huggingface.co/datasets/DataProvenanceInitiative/Commercially-Verified-Licenses) and ShareGPT-Command.
|
| 125 |
+
> We release the checkpoints under a Apache-2.0 license to further our mission of multilingual technologies empowering a
|
| 126 |
+
> multilingual world.
|
| 127 |
+
|
| 128 |
+
- **Developed by:** Cohere For AI
|
| 129 |
+
- **Model type:** a Transformer style autoregressive massively multilingual language model.
|
| 130 |
+
- **Paper**: [Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model](arxiv.com)
|
| 131 |
+
- **Point of Contact**: [Ahmet Ustun](mailto:ahmet@cohere.com)
|
| 132 |
+
- **Languages**: Refer to the list of languages in the `language` section of this model card.
|
| 133 |
+
- **License**: Apache-2.0
|
| 134 |
+
- **Model**: [Aya](https://huggingface.co/CohereForAI/aya)
|
| 135 |
+
- **Model Size**: 13 billion parameters
|
| 136 |
+
- **Datasets**: [xP3x](https://huggingface.co/datasets/CohereForAI/xP3x), [Aya Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset), [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection), [DataProvenance collection](https://huggingface.co/datasets/DataProvenanceInitiative/Commercially-Verified-Licenses), ShareGPT-Command.
|
| 137 |
+
|
| 138 |
+
## Use
|
| 139 |
+
|
| 140 |
+
```bash
|
| 141 |
+
# pip install -q transformers
|
| 142 |
+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
| 143 |
+
|
| 144 |
+
checkpoint = "CohereForAI/aya_model"
|
| 145 |
+
|
| 146 |
+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
| 147 |
+
aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
|
| 148 |
+
|
| 149 |
+
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
|
| 150 |
+
outputs = aya_model.generate(inputs)
|
| 151 |
+
print(tokenizer.decode(outputs[0]))
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
## Model Details
|
| 155 |
+
|
| 156 |
+
### Training
|
| 157 |
+
|
| 158 |
+
- Architecture: Same as [mt5-xxl](https://huggingface.co/google/mt5-xxl)
|
| 159 |
+
- Finetuning Steps: 25000
|
| 160 |
+
- Hardware: TPUv4-128
|
| 161 |
+
- Software: T5X, Jax
|
| 162 |
+
|
| 163 |
+
### Data Sources
|
| 164 |
+
|
| 165 |
+
The Aya model is trained on the following datasets:
|
| 166 |
+
|
| 167 |
+
- [xP3x](https://huggingface.co/datasets/CohereForAI/xP3x)
|
| 168 |
+
- [Aya Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset)
|
| 169 |
+
- [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection)
|
| 170 |
+
- [DataProvenance collection](https://huggingface.co/datasets/DataProvenanceInitiative/Commercially-Verified-Licenses)
|
| 171 |
+
- ShareGPT-Command
|
| 172 |
+
|
| 173 |
+
All datasets are subset to the 101 languages supported by [mT5]. See the [paper](arxiv.com) for details about filtering and pruning.
|
| 174 |
+
|
| 175 |
+
## Evaluation
|
| 176 |
+
|
| 177 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 178 |
+
|
| 179 |
+
> We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages – including discriminative, generative tasks, human evaluation and simulated win rates that cover both held-out tasks and
|
| 180 |
+
> in-distribution performance.
|
| 181 |
+
|
| 182 |
+
Below, we provide evaluation results for the Aya model on unseen discriminative tasks, and in-distribution generative tasks compared to mT0, BLOOMZ, Bactrian-X 13B, and mT0x. To ensure a fair comparison with our Aya model in terms of language coverage, we finetune a new variant of mT5, that we dub mT0x. It is trained using the original datasets that are part of the xP3 collection but extended to 101 languages (xP3x).
|
| 183 |
+
|
| 184 |
+
For Multlingual MMLU, Simulated and Human Win-rates, please refer to the [paper](arxiv.com)
|
| 185 |
+
|
| 186 |
+
### Discriminative Tasks
|
| 187 |
+
|
| 188 |
+
| Model | Base Model | IFT Mixture | XCOPA (Acc %) | XNLI (Acc %) | XSC (Acc %) | XWG (Acc %) | **<u>Avg</u>** |
|
| 189 |
+
| :---------------- | :--------- | :---------: | :-----------: | :----------: | :---------: | :---------: | :------------: |
|
| 190 |
+
| **46 Languages** | | | | | | | |
|
| 191 |
+
| mT0 | mT5 13B | xP3 | 75.6 | 55.3 | 87.2 | 73.6 | 72.9 |
|
| 192 |
+
| BLOOMZ | BLOOM 176B | xP3 | 64.3 | 52.0 | 82.6 | 63.3 | 65.5 |
|
| 193 |
+
| **52 Languages** | | | | | | | |
|
| 194 |
+
| Bactrian-X 13B | Llama 13B | Bactrian-X | 52.4 | 34.5 | 51.8 | 50.5 | 47.3 |
|
| 195 |
+
| **101 Languages** | | | | | | | |
|
| 196 |
+
| mT0x | mT5 13B | xP3x | 71.7 | 45.9 | 85.1 | 60.6 | 65.8 |
|
| 197 |
+
| Aya model | mT5 13B | All Mixture | 76.7 | 58.3 | 90.0 | 70.7 | 73.9 |
|
| 198 |
+
|
| 199 |
+
### Generative Tasks
|
| 200 |
+
|
| 201 |
+
| Model | Base Model | IFT Mixture | FLORES-200 (spBleu) | FLORES-200 (spBleu) | XLSum (RougeLsum) | Tydi-QA (F1) |
|
| 202 |
+
| :---------------- | :--------: | :---------- | :-----------------: | :-----------------: | :---------------: | :----------: |
|
| 203 |
+
| | | | X→ En | En → X | | |
|
| 204 |
+
| **101 Languages** | | | | | | |
|
| 205 |
+
| mT0x | mT5 13B | xP3x | 20.2 | 14.5 | 21.4 | 76.1 |
|
| 206 |
+
| Aya Model | mT5 13B | All Mixture | 29.1 | 19.0 | 22.0 | 77.8 |
|
| 207 |
+
|
| 208 |
+
Note: We cannot compare mT0, and BLOOMZ for the above generative tasks, as the validation splits are part of mT0 and BLOOMZ's training data.
|
| 209 |
+
|
| 210 |
+
## Bias, Risks, and Limitations
|
| 211 |
+
|
| 212 |
+
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
|
| 213 |
+
Aya model, as released, does not include any safety filtering.
|
| 214 |
+
We hope that the release of the Aya model will make community-based redteaming efforts possible, by exposing an open-source massively-multilingual model for community research.
|
| 215 |
+
|
| 216 |
+
For a detailed overview of our effort at safety mitigation and benchmarking toxicity and bias across multiple languages, we refer Sections 6 and 7 of our paper: [Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model](arxiv.com).
|
| 217 |
+
|
| 218 |
+
## Citation
|
| 219 |
+
|
| 220 |
+
**BibTeX:**
|
| 221 |
+
|
| 222 |
+
```
|
| 223 |
+
@article{,
|
| 224 |
+
title={},
|
| 225 |
+
author={},
|
| 226 |
+
journal={Preprint},
|
| 227 |
+
year={2024}
|
| 228 |
+
}
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
**APA:**
|
| 232 |
+
|
| 233 |
+
## Languages Covered
|
| 234 |
+
|
| 235 |
+
Below is the list of languages used in finetuning the Aya Model. We group languages into higher-, mid-, and lower-resourcedness based on a language classification by [Joshi et. al, 2020](https://microsoft.github.io/linguisticdiversity/). For further details, refer to our [paper]()
|
| 236 |
+
|
| 237 |
+
| ISO Code | Language Name | Script | Family | Subgrouping | Resourcedness |
|
| 238 |
+
| :------- | :-------------- | :----------: | :-------------: | :---------------: | :-----------: |
|
| 239 |
+
| afr | Afrikaans | Latin | Indo-European | Germanic | Mid |
|
| 240 |
+
| amh | Amharic | Ge'ez | Afro-Asiatic | Semitic | Low |
|
| 241 |
+
| ara | Arabic | Arabic | Afro-Asiatic | Semitic | High |
|
| 242 |
+
| aze | Azerbaijani | Arabic/Latin | Turkic | Common Turkic | Low |
|
| 243 |
+
| bel | Belarusian | Cyrillic | Indo-European | Balto-Slavic | Mid |
|
| 244 |
+
| ben | Bengali | Bengali | Indo-European | Indo-Aryan | Mid |
|
| 245 |
+
| bul | Bulgarian | Cyrillic | Indo-European | Balto-Slavic | Mid |
|
| 246 |
+
| cat | Catalan | Latin | Indo-European | Italic | High |
|
| 247 |
+
| ceb | Cebuano | Latin | Austronesian | Malayo-Polynesian | Mid |
|
| 248 |
+
| ces | Czech | Latin | Indo-European | Balto-Slavic | High |
|
| 249 |
+
| cym | Welsh | Latin | Indo-European | Celtic | Low |
|
| 250 |
+
| dan | Danish | Latin | Indo-European | Germanic | Mid |
|
| 251 |
+
| deu | German | Latin | Indo-European | Germanic | High |
|
| 252 |
+
| ell | Greek | Greek | Indo-European | Graeco-Phrygian | Mid |
|
| 253 |
+
| eng | English | Latin | Indo-European | Germanic | High |
|
| 254 |
+
| epo | Esperanto | Latin | Constructed | Esperantic | Low |
|
| 255 |
+
| est | Estonian | Latin | Uralic | Finnic | Mid |
|
| 256 |
+
| eus | Basque | Latin | Basque | - | High |
|
| 257 |
+
| fin | Finnish | Latin | Uralic | Finnic | High |
|
| 258 |
+
| fil | Tagalog | Latin | Austronesian | Malayo-Polynesian | Mid |
|
| 259 |
+
| fra | French | Latin | Indo-European | Italic | High |
|
| 260 |
+
| fry | Western Frisian | Latin | Indo-European | Germanic | Low |
|
| 261 |
+
| gla | Scottish Gaelic | Latin | Indo-European | Celtic | Low |
|
| 262 |
+
| gle | Irish | Latin | Indo-European | Celtic | Low |
|
| 263 |
+
| glg | Galician | Latin | Indo-European | Italic | Mid |
|
| 264 |
+
| guj | Gujarati | Gujarati | Indo-European | Indo-Aryan | Low |
|
| 265 |
+
| hat | Haitian Creole | Latin | Indo-European | Italic | Low |
|
| 266 |
+
| hau | Hausa | Latin | Afro-Asiatic | Chadic | Low |
|
| 267 |
+
| heb | Hebrew | Hebrew | Afro-Asiatic | Semitic | Mid |
|
| 268 |
+
| hin | Hindi | Devanagari | Indo-European | Indo-Aryan | High |
|
| 269 |
+
| hun | Hungarian | Latin | Uralic | - | High |
|
| 270 |
+
| hye | Armenian | Armenian | Indo-European | Armenic | Low |
|
| 271 |
+
| ibo | Igbo | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 272 |
+
| ind | Indonesian | Latin | Austronesian | Malayo-Polynesian | Mid |
|
| 273 |
+
| isl | Icelandic | Latin | Indo-European | Germanic | Low |
|
| 274 |
+
| ita | Italian | Latin | Indo-European | Italic | High |
|
| 275 |
+
| jav | Javanese | Latin | Austronesian | Malayo-Polynesian | Low |
|
| 276 |
+
| jpn | Japanese | Japanese | Japonic | Japanesic | High |
|
| 277 |
+
| kan | Kannada | Kannada | Dravidian | South Dravidian | Low |
|
| 278 |
+
| kat | Georgian | Georgian | Kartvelian | Georgian-Zan | Mid |
|
| 279 |
+
| kaz | Kazakh | Cyrillic | Turkic | Common Turkic | Mid |
|
| 280 |
+
| khm | Khmer | Khmer | Austroasiatic | Khmeric | Low |
|
| 281 |
+
| kir | Kyrgyz | Cyrillic | Turkic | Common Turkic | Low |
|
| 282 |
+
| kor | Korean | Hangul | Koreanic | Korean | High |
|
| 283 |
+
| kur | Kurdish | Latin | Indo-European | Iranian | Low |
|
| 284 |
+
| lao | Lao | Lao | Tai-Kadai | Kam-Tai | Low |
|
| 285 |
+
| lav | Latvian | Latin | Indo-European | Balto-Slavic | Mid |
|
| 286 |
+
| lat | Latin | Latin | Indo-European | Italic | Mid |
|
| 287 |
+
| lit | Lithuanian | Latin | Indo-European | Balto-Slavic | Mid |
|
| 288 |
+
| ltz | Luxembourgish | Latin | Indo-European | Germanic | Low |
|
| 289 |
+
| mal | Malayalam | Malayalam | Dravidian | South Dravidian | Low |
|
| 290 |
+
| mar | Marathi | Devanagari | Indo-European | Indo-Aryan | Low |
|
| 291 |
+
| mkd | Macedonian | Cyrillic | Indo-European | Balto-Slavic | Low |
|
| 292 |
+
| mlg | Malagasy | Latin | Austronesian | Malayo-Polynesian | Low |
|
| 293 |
+
| mlt | Maltese | Latin | Afro-Asiatic | Semitic | Low |
|
| 294 |
+
| mon | Mongolian | Cyrillic | Mongolic-Khitan | Mongolic | Low |
|
| 295 |
+
| mri | Maori | Latin | Austronesian | Malayo-Polynesian | Low |
|
| 296 |
+
| msa | Malay | Latin | Austronesian | Malayo-Polynesian | Mid |
|
| 297 |
+
| mya | Burmese | Myanmar | Sino-Tibetan | Burmo-Qiangic | Low |
|
| 298 |
+
| nep | Nepali | Devanagari | Indo-European | Indo-Aryan | Low |
|
| 299 |
+
| nld | Dutch | Latin | Indo-European | Germanic | High |
|
| 300 |
+
| nor | Norwegian | Latin | Indo-European | Germanic | Low |
|
| 301 |
+
| nso | Northern Sotho | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 302 |
+
| nya | Chichewa | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 303 |
+
| ory | Oriya | Oriya | Indo-European | Indo-Aryan | Low |
|
| 304 |
+
| pan | Punjabi | Gurmukhi | Indo-European | Indo-Aryan | Low |
|
| 305 |
+
| pes | Persian | Arabic | Indo-European | Iranian | High |
|
| 306 |
+
| pol | Polish | Latin | Indo-European | Balto-Slavic | High |
|
| 307 |
+
| por | Portuguese | Latin | Indo-European | Italic | High |
|
| 308 |
+
| pus | Pashto | Arabic | Indo-European | Iranian | Low |
|
| 309 |
+
| ron | Romanian | Latin | Indo-European | Italic | Mid |
|
| 310 |
+
| rus | Russian | Cyrillic | Indo-European | Balto-Slavic | High |
|
| 311 |
+
| sin | Sinhala | Sinhala | Indo-European | Indo-Aryan | Low |
|
| 312 |
+
| slk | Slovak | Latin | Indo-European | Balto-Slavic | Mid |
|
| 313 |
+
| slv | Slovenian | Latin | Indo-European | Balto-Slavic | Mid |
|
| 314 |
+
| smo | Samoan | Latin | Austronesian | Malayo-Polynesian | Low |
|
| 315 |
+
| sna | Shona | Latin | Indo-European | Indo-Aryan | Low |
|
| 316 |
+
| snd | Sindhi | Arabic | Indo-European | Indo-Aryan | Low |
|
| 317 |
+
| som | Somali | Latin | Afro-Asiatic | Cushitic | Low |
|
| 318 |
+
| sot | Southern Sotho | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 319 |
+
| spa | Spanish | Latin | Indo-European | Italic | High |
|
| 320 |
+
| sqi | Albanian | Latin | Indo-European | Albanian | Low |
|
| 321 |
+
| srp | Serbian | Cyrillic | Indo-European | Balto-Slavic | High |
|
| 322 |
+
| sun | Sundanese | Latin | Austronesian | Malayo-Polynesian | Low |
|
| 323 |
+
| swa | Swahili | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 324 |
+
| swe | Swedish | Latin | Indo-European | Germanic | High |
|
| 325 |
+
| tam | Tamil | Tamil | Dravidian | South Dravidian | Mid |
|
| 326 |
+
| tel | Telugu | Telugu | Dravidian | South Dravidian | Low |
|
| 327 |
+
| tgk | Tajik | Cyrillic | Indo-European | Iranian | Low |
|
| 328 |
+
| tha | Thai | Thai | Tai-Kadai | Kam-Tai | Mid |
|
| 329 |
+
| tur | Turkish | Latin | Turkic | Common Turkic | High |
|
| 330 |
+
| twi | Twi | Latin | Atlantic-Congo | Niger-Congo | Low |
|
| 331 |
+
| ukr | Ukrainian | Cyrillic | Indo-European | Balto-Slavic | Mid |
|
| 332 |
+
| urd | Urdu | Arabic | Indo-European | Indo-Aryan | Mid |
|
| 333 |
+
| uzb | Uzbek | Latin | Turkic | Common Turkic | Mid |
|
| 334 |
+
| vie | Vietnamese | Latin | Austroasiatic | Vietic | High |
|
| 335 |
+
| xho | Xhosa | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 336 |
+
| yid | Yiddish | Hebrew | Indo-European | Germanic | Low |
|
| 337 |
+
| yor | Yoruba | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 338 |
+
| zho | Chinese | Han | Sino-Tibetan | Sinitic | High |
|
| 339 |
+
| zul | Zulu | Latin | Atlantic-Congo | Benue-Congo | Low |
|
| 340 |
+
|
| 341 |
+
## Model Card Contact
|
| 342 |
+
|
| 343 |
+
For errors in this model card, contact Ahmet or Viraat, `{ahmet, viraat} at cohere dot com`.
|