Collections
Discover the best community collections!
Collections including paper arXiv:2405.09673
-
Iterative Reasoning Preference Optimization
Paper • 2404.19733 • Published • 49 -
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 79 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 69 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 115
-
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89 -
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
Paper • 2410.01782 • Published • 10
-
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 51 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 34 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46
-
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 29 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89 -
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Paper • 2405.17428 • Published • 19
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Paper • 2402.03293 • Published • 6 -
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Paper • 2401.11316 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 48 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 7 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 24 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625
-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 42 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 15 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 25
-
Iterative Reasoning Preference Optimization
Paper • 2404.19733 • Published • 49 -
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 79 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 69 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 115
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Paper • 2402.03293 • Published • 6 -
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Paper • 2401.11316 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89 -
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
Paper • 2410.01782 • Published • 10
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72
-
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 51 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 34 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 48 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 7 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 24 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625
-
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 29 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89 -
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Paper • 2405.17428 • Published • 19
-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 42 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 15 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 25