Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
9
1
Georgios Smyrnis
gsmyrnis
Follow
thomwolf's profile picture
21world's profile picture
ryanmarten's profile picture
5 followers
ยท
3 following
AI & ML interests
None yet
Organizations
gsmyrnis
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
open-thoughts/OpenThoughts2-1M
8 months ago
Any rundown on the data sources?
๐
2
5
#2 opened 8 months ago by
teknium
New activity in
mlfoundations/dclm-7b-it
about 1 year ago
Update config.json
1
#4 opened about 1 year ago by
sedrickkeh
New activity in
mlfoundations/dclm-pool-400m-1x
about 1 year ago
TypeError: Couldn't cast array of type
1
#1 opened about 1 year ago by
shizhediao2
New activity in
mlfoundations/dclm-baseline-1.0-parquet
about 1 year ago
Seems like WARC metadata is missing from this version?
1
#4 opened over 1 year ago by
yury-zyphra
New activity in
mlfoundations/dclm-baseline-1.0
over 1 year ago
Missing files
3
#2 opened over 1 year ago by
pengyuan
Were the documents shuffled before the dataset was split into shards?
3
#5 opened over 1 year ago by
yury-zyphra
Would you share the 0.28T token dataset for achieve highest scores in 7B-2x experiment?
2
#6 opened over 1 year ago by
Mars2050
How many rows are there in the dataset?
1
#4 opened over 1 year ago by
yury-zyphra
New activity in
mlfoundations/datacomp_small
about 2 years ago
Reproduce the clip score
1
#1 opened over 2 years ago by
zhangjc404