main releases
Collection
small models aiming at language modeling without system prompts
•
3 items
•
Updated
•
2
This is a passthrough of arco with an experimental model. It improved on arc challenge, only missing 1.2 points to get to the level of modern 3b baseline performance.
If you prefer answering multilingual, general knowledge, trivially simple questions chose qwen or llama. If you prefer solving trivially simple english tasks while being half the size, chose arco.
there is no prompt intentionally set.
zero-shot results from state-of-the-art small language models
| Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average |
|---|---|---|---|---|---|---|---|
| 0.5b | qwen 2 | 44.13 | 28.92 | 49.05 | 69.31 | 56.99 | 49.68 |
| 0.3b | smollm | 25.52 | 37.71 | 56.41 | 71.93 | 59.27 | 50.17 |
| 0.5b | danube 3 | 24.81 | 36.18 | 60.46 | 73.78 | 61.01 | 51.25 |
| 0.5b | qwen 2.5 | 47.29 | 31.83 | 52.17 | 70.29 | 57.06 | 51.72 |
| 0.5b | arco | 26.17 | 37.29 | 62.88 | 74.37 | 62.27 | 52.60 |
| 0.5b | arco 2 | 25.51 | 38.82 | 63.02 | 74.70 | 61.25 | 52.66 |
arco also means "arc optimized" hence the focus on this cognitive-based benchmark.