-
Apple Intelligence Foundation Language Models
Paper • 2407.21075 • Published • 5 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 116 -
Nemotron-4 340B Technical Report
Paper • 2406.11704 • Published -
Gemma 2: Improving Open Language Models at a Practical Size
Paper • 2408.00118 • Published • 79
Collections
Discover the best community collections!
Collections including paper arXiv:2412.15115
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
ThunderKittens: Simple, Fast, and Adorable AI Kernels
Paper • 2410.20399 • Published • 2
-
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
Paper • 2009.10622 • Published • 1 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Paper • 2401.14361 • Published • 2
-
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
Paper • 2402.00769 • Published • 22 -
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 22 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper • 2404.07616 • Published • 16
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 53 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 46 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24
-
openchat/openchat-3.5-1210
Text Generation • 7B • Updated • 3.01k • 279 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 130 -
Babelscape/rebel-large
0.4B • Updated • 34.8k • 230
-
Apple Intelligence Foundation Language Models
Paper • 2407.21075 • Published • 5 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 116 -
Nemotron-4 340B Technical Report
Paper • 2406.11704 • Published -
Gemma 2: Improving Open Language Models at a Practical Size
Paper • 2408.00118 • Published • 79
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
ThunderKittens: Simple, Fast, and Adorable AI Kernels
Paper • 2410.20399 • Published • 2
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper • 2404.07616 • Published • 16
-
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
Paper • 2009.10622 • Published • 1 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Paper • 2401.14361 • Published • 2
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 53 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 46 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24
-
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
Paper • 2402.00769 • Published • 22 -
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 22 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21
-
openchat/openchat-3.5-1210
Text Generation • 7B • Updated • 3.01k • 279 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 130 -
Babelscape/rebel-large
0.4B • Updated • 34.8k • 230