Update README.md
Browse files
README.md
CHANGED
|
@@ -26,6 +26,7 @@ Notes:
|
|
| 26 |
<br>\- Reminder, this isn't a native 32K model. It has it's issues, but it's coherent and working well.
|
| 27 |
|
| 28 |
Sanity Check // Needle in a Haystack Results:
|
|
|
|
| 29 |

|
| 30 |
|
| 31 |
Wandb Run:
|
|
@@ -39,6 +40,7 @@ Relevant Axolotl Configurations:
|
|
| 39 |
<br>\- 2M Rope Theta had the best loss results during training compared to other values.
|
| 40 |
<br>\- Leaving it at 500K rope wasn't that much worse, but 4M and 8M Theta made the grad_norm values worsen even if loss drops fast.
|
| 41 |
<br>\- Mixing in Pretraining Data was a PITA. Made it a lot worse with formatting. -> Tried at low value mixes, eg. <20% and lower.
|
|
|
|
| 42 |
<br>\- Improper / Bad Rope Theta shows in Grad_Norm exploding to thousands. It'll drop to low values alright, but it's a scary fast drop even with gradient clipping.
|
| 43 |
|
| 44 |
```
|
|
|
|
| 26 |
<br>\- Reminder, this isn't a native 32K model. It has it's issues, but it's coherent and working well.
|
| 27 |
|
| 28 |
Sanity Check // Needle in a Haystack Results:
|
| 29 |
+
<br>\- This is not as complex as RULER or NIAN, but it's a basic evaluator. Some improper train examples had Haystack scores ranging from Red to Orange for most of the extended contexts.
|
| 30 |

|
| 31 |
|
| 32 |
Wandb Run:
|
|
|
|
| 40 |
<br>\- 2M Rope Theta had the best loss results during training compared to other values.
|
| 41 |
<br>\- Leaving it at 500K rope wasn't that much worse, but 4M and 8M Theta made the grad_norm values worsen even if loss drops fast.
|
| 42 |
<br>\- Mixing in Pretraining Data was a PITA. Made it a lot worse with formatting. -> Tried at low value mixes, eg. <20% and lower.
|
| 43 |
+
<br>\- Pretraining / Noise made it worse at Haystack too? It wasn't all Green, Mainly Oranges.
|
| 44 |
<br>\- Improper / Bad Rope Theta shows in Grad_Norm exploding to thousands. It'll drop to low values alright, but it's a scary fast drop even with gradient clipping.
|
| 45 |
|
| 46 |
```
|