Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2304.01373

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 96
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 21

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29, 2024 • 48
LLM360/K2

Text Generation • 65B • Updated Jul 29, 2024 • 172 • 94
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 57

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 20
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 81
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
Zoology: Measuring and Improving Recall in Efficient Language Models

Paper • 2312.04927 • Published Dec 8, 2023 • 2

Crystal_Alchemist

nvidia/nemo-megatron-gpt-20B

Updated Sep 21, 2022 • 23 • 32
Runtime error

1

Openai Clip Vit Large Patch14

🏃

1
nvidia/nemo-megatron-mt5-3B

Updated Dec 1, 2022 • 37 • 13
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 90

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 3
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 14
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 247
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55
microsoft/phi-2

Text Generation • 3B • Updated Apr 29, 2024 • 821k • 3.41k

Pythia Scaling Suite

Pythia is the first LLM suite designed specifically to enable scientific research on LLMs. To learn more see https://github.com/EleutherAI/pythia

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9
EleutherAI/pythia-14m

Text Generation • 39.2M • Updated Jul 26, 2023 • 85.8k • 25
EleutherAI/pythia-70m

95.6M • Updated Nov 21, 2023 • 152k • 74
EleutherAI/pythia-160m

Text Generation • 0.2B • Updated Jul 9, 2023 • 101k • 35

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 96
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 21

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29, 2024 • 48
LLM360/K2

Text Generation • 65B • Updated Jul 29, 2024 • 172 • 94
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 57

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 3
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 14
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 20
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 81
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
Zoology: Measuring and Improving Recall in Efficient Language Models

Paper • 2312.04927 • Published Dec 8, 2023 • 2

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 247
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55
microsoft/phi-2

Text Generation • 3B • Updated Apr 29, 2024 • 821k • 3.41k

Crystal_Alchemist

nvidia/nemo-megatron-gpt-20B

Updated Sep 21, 2022 • 23 • 32
Runtime error

1

Openai Clip Vit Large Patch14

🏃

1
nvidia/nemo-megatron-mt5-3B

Updated Dec 1, 2022 • 37 • 13
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 90

Pythia Scaling Suite

Pythia is the first LLM suite designed specifically to enable scientific research on LLMs. To learn more see https://github.com/EleutherAI/pythia

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9
EleutherAI/pythia-14m

Text Generation • 39.2M • Updated Jul 26, 2023 • 85.8k • 25
EleutherAI/pythia-70m

95.6M • Updated Nov 21, 2023 • 152k • 74
EleutherAI/pythia-160m

Text Generation • 0.2B • Updated Jul 9, 2023 • 101k • 35

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs