Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published Nov 27, 2025 • 4
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 8
view post Post 2106 The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot See translation 4 replies · 🔥 5 5 🚀 1 1 😔 1 1 + Reply
view post Post 2443 The Lichess database of games, puzzles, and engine evaluations is now on the Hub: Lichess Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗- https://huggingface.co/collections/Lichess/positions-datasets-66f50837db5cd3287d60d489- https://huggingface.co/collections/Lichess/games-datasets-66f508df78f4b43e1bb2d353 See translation 👍 7 7 ❤️ 2 2 🔥 1 1 + Reply
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities Paper • 2410.07722 • Published Oct 10, 2024 • 15
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Paper • 2403.15246 • Published Mar 22, 2024 • 11
view post Post Data map of the languages of https://huggingface.co/datasets/CohereForAI/aya_dataset 1 reply · ❤️ 12 12 + Reply