-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper β’ 2509.05739 β’ Published β’ 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper β’ 2509.03059 β’ Published β’ 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper β’ 2509.00244 β’ Published β’ 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper β’ 2509.08358 β’ Published β’ 13
Collections
Discover the best community collections!
Collections including paper arXiv:2506.20920
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper β’ 2506.07044 β’ Published β’ 113 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper β’ 2506.09513 β’ Published β’ 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper β’ 2505.24863 β’ Published β’ 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper β’ 2506.09113 β’ Published β’ 102
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
SmolVLM: Redefining small and efficient multimodal models
Paper β’ 2504.05299 β’ Published β’ 200 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper β’ 2504.01833 β’ Published β’ 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 248
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
HuggingFaceFW/fineweb-2
Viewer β’ Updated β’ 4.48B β’ 95.5k β’ 684 -
80
Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks
πEvaluate multilingual models using FineTasks
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
SmolVLM: Redefining small and efficient multimodal models
Paper β’ 2504.05299 β’ Published β’ 200 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper β’ 2303.03915 β’ Published β’ 7 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 248
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper β’ 2506.18095 β’ Published β’ 66 -
FreedomIntelligence/ShareGPT-4o-Image
Viewer β’ Updated β’ 92.3k β’ 1.17k β’ 88 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75
-
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Paper β’ 2405.19504 β’ Published β’ 3 -
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
Paper β’ 2506.20452 β’ Published β’ 19 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Paper β’ 2507.18553 β’ Published β’ 40
-
The Curse of Depth in Large Language Models
Paper β’ 2502.05795 β’ Published β’ 40 -
Transformers without Normalization
Paper β’ 2503.10622 β’ Published β’ 170 -
Parallel Scaling Law for Language Models
Paper β’ 2505.10475 β’ Published β’ 83 -
Learning to Skip the Middle Layers of Transformers
Paper β’ 2506.21103 β’ Published β’ 18
-
ahmedheakl/resume-atlas
Viewer β’ Updated β’ 13.4k β’ 276 β’ 10 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
279
Infinite Dataset Hub
βΎSearch and save datasets generated with a LLM in real time
-
IntrEx: A Dataset for Modeling Engagement in Educational Conversations
Paper β’ 2509.06652 β’ Published β’ 24
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper β’ 2509.05739 β’ Published β’ 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper β’ 2509.03059 β’ Published β’ 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper β’ 2509.00244 β’ Published β’ 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper β’ 2509.08358 β’ Published β’ 13
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
SmolVLM: Redefining small and efficient multimodal models
Paper β’ 2504.05299 β’ Published β’ 200 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper β’ 2303.03915 β’ Published β’ 7 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 248
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper β’ 2506.18095 β’ Published β’ 66 -
FreedomIntelligence/ShareGPT-4o-Image
Viewer β’ Updated β’ 92.3k β’ 1.17k β’ 88 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper β’ 2506.07044 β’ Published β’ 113 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper β’ 2506.09513 β’ Published β’ 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper β’ 2505.24863 β’ Published β’ 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper β’ 2506.09113 β’ Published β’ 102
-
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Paper β’ 2405.19504 β’ Published β’ 3 -
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
Paper β’ 2506.20452 β’ Published β’ 19 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Paper β’ 2507.18553 β’ Published β’ 40
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
SmolVLM: Redefining small and efficient multimodal models
Paper β’ 2504.05299 β’ Published β’ 200 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper β’ 2504.01833 β’ Published β’ 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 248
-
The Curse of Depth in Large Language Models
Paper β’ 2502.05795 β’ Published β’ 40 -
Transformers without Normalization
Paper β’ 2503.10622 β’ Published β’ 170 -
Parallel Scaling Law for Language Models
Paper β’ 2505.10475 β’ Published β’ 83 -
Learning to Skip the Middle Layers of Transformers
Paper β’ 2506.21103 β’ Published β’ 18
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
HuggingFaceFW/fineweb-2
Viewer β’ Updated β’ 4.48B β’ 95.5k β’ 684 -
80
Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks
πEvaluate multilingual models using FineTasks
-
ahmedheakl/resume-atlas
Viewer β’ Updated β’ 13.4k β’ 276 β’ 10 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
279
Infinite Dataset Hub
βΎSearch and save datasets generated with a LLM in real time
-
IntrEx: A Dataset for Modeling Engagement in Educational Conversations
Paper β’ 2509.06652 β’ Published β’ 24