-
Attention Is All You Need
Paper • 1706.03762 • Published • 96 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
Collections
Discover the best community collections!
Collections including paper arxiv:2304.01373
-
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Paper • 2405.19327 • Published • 48 -
LLM360/K2
Text Generation • 65B • Updated • 172 • 94 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper • 2312.06550 • Published • 57
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 3 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9
-
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
Mistral 7B
Paper • 2310.06825 • Published • 55 -
microsoft/phi-2
Text Generation • 3B • Updated • 821k • 3.41k
-
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9 -
EleutherAI/pythia-14m
Text Generation • 39.2M • Updated • 85.8k • 25 -
EleutherAI/pythia-70m
95.6M • Updated • 152k • 74 -
EleutherAI/pythia-160m
Text Generation • 0.2B • Updated • 101k • 35
-
Attention Is All You Need
Paper • 1706.03762 • Published • 96 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Paper • 2405.19327 • Published • 48 -
LLM360/K2
Text Generation • 65B • Updated • 172 • 94 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper • 2312.06550 • Published • 57
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 3 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
-
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
Mistral 7B
Paper • 2310.06825 • Published • 55 -
microsoft/phi-2
Text Generation • 3B • Updated • 821k • 3.41k
-
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9 -
EleutherAI/pythia-14m
Text Generation • 39.2M • Updated • 85.8k • 25 -
EleutherAI/pythia-70m
95.6M • Updated • 152k • 74 -
EleutherAI/pythia-160m
Text Generation • 0.2B • Updated • 101k • 35