-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
SmolVLM: Redefining small and efficient multimodal models
Paper β’ 2504.05299 β’ Published β’ 200 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper β’ 2504.01833 β’ Published β’ 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 249
Collections
Discover the best community collections!
Collections including paper arxiv:2211.05100
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper β’ 2211.05100 β’ Published β’ 34 -
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Paper β’ 2308.06721 β’ Published β’ 33 -
LEDITS++: Limitless Image Editing using Text-to-Image Models
Paper β’ 2311.16711 β’ Published β’ 24
-
Nemotron-4 15B Technical Report
Paper β’ 2402.16819 β’ Published β’ 46 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper β’ 2402.19427 β’ Published β’ 56 -
RWKV: Reinventing RNNs for the Transformer Era
Paper β’ 2305.13048 β’ Published β’ 19 -
Reformer: The Efficient Transformer
Paper β’ 2001.04451 β’ Published
-
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 55 -
BloombergGPT: A Large Language Model for Finance
Paper β’ 2303.17564 β’ Published β’ 27 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 23 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper β’ 1910.01108 β’ Published β’ 21
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper β’ 2211.05100 β’ Published β’ 34 -
FlauBERT: Unsupervised Language Model Pre-training for French
Paper β’ 1912.05372 β’ Published -
CroissantLLM: A Truly Bilingual French-English Language Model
Paper β’ 2402.00786 β’ Published β’ 26 -
AION-1: Omnimodal Foundation Model for Astronomical Sciences
Paper β’ 2510.17960 β’ Published β’ 28
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper β’ 1907.11692 β’ Published β’ 9 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 17 -
OPT: Open Pre-trained Transformer Language Models
Paper β’ 2205.01068 β’ Published β’ 2
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper β’ 2211.05100 β’ Published β’ 34 -
Contrastive Language-Image Pre-training for the Italian Language
Paper β’ 2108.08688 β’ Published β’ 2 -
IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation
Paper β’ 2203.03759 β’ Published β’ 5 -
Spanish Pre-trained BERT Model and Evaluation Data
Paper β’ 2308.02976 β’ Published β’ 3
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 96 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper β’ 1907.11692 β’ Published β’ 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper β’ 1910.01108 β’ Published β’ 21
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper β’ 2506.20920 β’ Published β’ 75 -
SmolVLM: Redefining small and efficient multimodal models
Paper β’ 2504.05299 β’ Published β’ 200 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper β’ 2504.01833 β’ Published β’ 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 249
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper β’ 2211.05100 β’ Published β’ 34 -
FlauBERT: Unsupervised Language Model Pre-training for French
Paper β’ 1912.05372 β’ Published -
CroissantLLM: A Truly Bilingual French-English Language Model
Paper β’ 2402.00786 β’ Published β’ 26 -
AION-1: Omnimodal Foundation Model for Astronomical Sciences
Paper β’ 2510.17960 β’ Published β’ 28
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper β’ 2211.05100 β’ Published β’ 34 -
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Paper β’ 2308.06721 β’ Published β’ 33 -
LEDITS++: Limitless Image Editing using Text-to-Image Models
Paper β’ 2311.16711 β’ Published β’ 24
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper β’ 1907.11692 β’ Published β’ 9 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 17 -
OPT: Open Pre-trained Transformer Language Models
Paper β’ 2205.01068 β’ Published β’ 2
-
Nemotron-4 15B Technical Report
Paper β’ 2402.16819 β’ Published β’ 46 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper β’ 2402.19427 β’ Published β’ 56 -
RWKV: Reinventing RNNs for the Transformer Era
Paper β’ 2305.13048 β’ Published β’ 19 -
Reformer: The Efficient Transformer
Paper β’ 2001.04451 β’ Published
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper β’ 2211.05100 β’ Published β’ 34 -
Contrastive Language-Image Pre-training for the Italian Language
Paper β’ 2108.08688 β’ Published β’ 2 -
IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation
Paper β’ 2203.03759 β’ Published β’ 5 -
Spanish Pre-trained BERT Model and Evaluation Data
Paper β’ 2308.02976 β’ Published β’ 3
-
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 55 -
BloombergGPT: A Large Language Model for Finance
Paper β’ 2303.17564 β’ Published β’ 27 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 23 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper β’ 1910.01108 β’ Published β’ 21
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 96 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper β’ 1907.11692 β’ Published β’ 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper β’ 1910.01108 β’ Published β’ 21