-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arXiv:2402.17193
-
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Paper • 2402.19481 • Published • 22 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 24 -
The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 10
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 63 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 146 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper • 2402.07827 • Published • 48
-
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 149 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20 -
Priority Sampling of Large Language Models for Compilers
Paper • 2402.18734 • Published • 19
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 146 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper • 2402.07827 • Published • 48
-
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Paper • 2402.19481 • Published • 22 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45
-
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 149 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20 -
Priority Sampling of Large Language Models for Compilers
Paper • 2402.18734 • Published • 19
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 24 -
The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 10
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 63 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54