Pretrained models from the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling"
Zayd Muhammad Kawakibi Zuhri PRO
zaydzuhri
AI & ML interests
I really like watching loss go down
Recent Activity
updated
a model
9 days ago
zaydzuhri/dsmtp-7B-4096-model
published
a model
9 days ago
zaydzuhri/dsmtp-7B-4096-model
Organizations
None yet
models
85
zaydzuhri/dsmtp-7B-4096-model
7B
•
Updated
•
293
zaydzuhri/dsmtp-7B-4096-batch8x2-steps200000-20251016-114748
Updated
zaydzuhri/mtp-math-1.8B-4096-model
2B
•
Updated
•
4
zaydzuhri/mtp-math-1B-4096-batch16x1-steps40000-20251004-223317
Updated
zaydzuhri/top-math-1.8B-4096-model
2B
•
Updated
•
5
zaydzuhri/top-math-1B-4096-batch16x1-steps40000-20251003-211951
Updated
zaydzuhri/vanilla-math-1B-4096-batch16x1-steps40000-20251003-204935
Updated
zaydzuhri/dsmtp-7B-4096-batch8x2-steps200000-20250909-122923
Updated
zaydzuhri/vanilla-math-1.8B-4096-model
2B
•
Updated
•
1
zaydzuhri/vanilla-math-1B-4096-batch16x1-steps40000-20250915-145922
Updated
datasets
9
zaydzuhri/OpenMathInstruct-2-Text
Viewer
•
Updated
•
22M
•
182
zaydzuhri/stack-edu-python
Viewer
•
Updated
•
25.3M
•
30
zaydzuhri/stack-edu
Viewer
•
Updated
•
99.5M
•
202
zaydzuhri/kreyol-mt-cleaned
Viewer
•
Updated
•
903k
•
160
zaydzuhri/the_pile_tokenized_5percent_truncated_packed_v2
Viewer
•
Updated
•
2.46M
•
57
zaydzuhri/the_pile_tokenized_5percent_truncated_packed
Viewer
•
Updated
•
2.11M
•
174
zaydzuhri/the_pile_tokenized_5percent_truncated
Viewer
•
Updated
•
6M
•
191
zaydzuhri/the_pile_tokenized_5percent
Viewer
•
Updated
•
6M
•
20
zaydzuhri/the_pile_tokenized_6k
Viewer
•
Updated
•
6k
•
6