Update README.md
Browse files
README.md
CHANGED
|
@@ -15,12 +15,16 @@ tags:
|
|
| 15 |
|
| 16 |
An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
|
| 17 |
|
| 18 |
-
- tokenizer:
|
| 19 |
- data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
|
| 20 |
- context length: 1024 ctx
|
| 21 |
|
| 22 |
## details
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
1. Model:
|
| 26 |
- Dropout rate: 0.0
|
|
@@ -46,6 +50,7 @@ An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com
|
|
| 46 |
4. Hardware:
|
| 47 |
- Device: RTX 4080
|
| 48 |
- Precision: bfloat16, tf32
|
|
|
|
| 49 |
|
| 50 |
## plots
|
| 51 |
|
|
@@ -55,6 +60,9 @@ training loss
|
|
| 55 |

|
| 56 |
|
| 57 |
|
|
|
|
|
|
|
|
|
|
| 58 |
grad norm
|
| 59 |
|
| 60 |

|
|
@@ -65,5 +73,6 @@ weights norm
|
|
| 65 |
|
| 66 |

|
| 67 |
|
|
|
|
| 68 |
|
| 69 |
---
|
|
|
|
| 15 |
|
| 16 |
An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
|
| 17 |
|
| 18 |
+
- tokenizer: sentencepiece BPE w/ byte fallback, 48k vocab (from [vocab scaling laws](https://hf.co/collections/sail/scaling-laws-with-vocabulary-6699e0cbd77a8b2870859bfe))
|
| 19 |
- data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
|
| 20 |
- context length: 1024 ctx
|
| 21 |
|
| 22 |
## details
|
| 23 |
|
| 24 |
+
Detailed info, including training logs, configs, and checkpoints can be found under `checkpoints/` in this repo.
|
| 25 |
+
|
| 26 |
+
<details>
|
| 27 |
+
<summary><strong>Expand hyperparameter overview</strong></summary>
|
| 28 |
|
| 29 |
1. Model:
|
| 30 |
- Dropout rate: 0.0
|
|
|
|
| 50 |
4. Hardware:
|
| 51 |
- Device: RTX 4080
|
| 52 |
- Precision: bfloat16, tf32
|
| 53 |
+
</details>
|
| 54 |
|
| 55 |
## plots
|
| 56 |
|
|
|
|
| 60 |

|
| 61 |
|
| 62 |
|
| 63 |
+
<details>
|
| 64 |
+
<summary><strong>Expand grad and weights L2 norm plots</strong></summary>
|
| 65 |
+
|
| 66 |
grad norm
|
| 67 |
|
| 68 |

|
|
|
|
| 73 |
|
| 74 |

|
| 75 |
|
| 76 |
+
</details>
|
| 77 |
|
| 78 |
---
|