---
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
---

# Model Details
This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with AdamW optimizer. 20x chinchilla rule for 20B tokens seen.

# How to use
```
import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_1b_chinchilla_8132025",
)

print(pipe("The key to life is"))
```

# Downstream Eval

## ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
```
lm_eval --model hf --model_args pretrained=kz919/llama3_1b_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
```

|    Tasks     |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|arc_challenge |      1|none  |     0|acc       |↑  | 0.2688|±  |0.0130|
|              |       |none  |     0|acc_norm  |↑  | 0.2875|±  |0.0132|
|arc_easy      |      1|none  |     0|acc       |↑  | 0.6082|±  |0.0100|
|              |       |none  |     0|acc_norm  |↑  | 0.5412|±  |0.0102|
|hellaswag     |      1|none  |     0|acc       |↑  | 0.3459|±  |0.0047|
|              |       |none  |     0|acc_norm  |↑  | 0.4169|±  |0.0049|
|lambada_openai|      1|none  |     0|acc       |↑  | 0.3311|±  |0.0066|
|              |       |none  |     0|perplexity|↓  |38.2983|±  |1.5427|
|openbookqa    |      1|none  |     0|acc       |↑  | 0.2340|±  |0.0190|
|              |       |none  |     0|acc_norm  |↑  | 0.3500|±  |0.0214|
|piqa          |      1|none  |     0|acc       |↑  | 0.6795|±  |0.0109|
|              |       |none  |     0|acc_norm  |↑  | 0.6774|±  |0.0109|

## MMLU
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.2529|±  |0.0037|
| - humanities     |      2|none  |      |acc   |↑  |0.2459|±  |0.0063|
| - other          |      2|none  |      |acc   |↑  |0.2424|±  |0.0077|
| - social sciences|      2|none  |      |acc   |↑  |0.2697|±  |0.0080|
| - stem           |      2|none  |      |acc   |↑  |0.2572|±  |0.0078|