-
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
Paper • 2401.17053 • Published • 33 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 32 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 131 -
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Paper • 2402.05930 • Published • 39
Collections
Discover the best community collections!
Collections including paper arxiv:2402.14905
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 48 -
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Paper • 2310.04378 • Published • 22 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119
-
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Paper • 2401.16158 • Published • 20 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 29 -
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Paper • 2406.01014 • Published • 34
-
TheBloke/Nous-Capybara-34B-GGUF
34B • Updated • 2.91k • 167 -
NousResearch/Nous-Hermes-2-SOLAR-10.7B
Text Generation • 11B • Updated • 7.97k • 208 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
bartowski/Codestral-22B-v0.1-GGUF
Text Generation • 22B • Updated • 2.67k • 187
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper • 2312.13964 • Published • 20 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 18
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
Alignment for Honesty
Paper • 2312.07000 • Published • 16 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 15
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
-
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
Paper • 2401.17053 • Published • 33 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 32 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 131 -
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Paper • 2402.05930 • Published • 39
-
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Paper • 2401.16158 • Published • 20 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 29 -
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Paper • 2406.01014 • Published • 34
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60
-
TheBloke/Nous-Capybara-34B-GGUF
34B • Updated • 2.91k • 167 -
NousResearch/Nous-Hermes-2-SOLAR-10.7B
Text Generation • 11B • Updated • 7.97k • 208 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
bartowski/Codestral-22B-v0.1-GGUF
Text Generation • 22B • Updated • 2.67k • 187
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper • 2312.13964 • Published • 20 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 18
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
Alignment for Honesty
Paper • 2312.07000 • Published • 16 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 15
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 48 -
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Paper • 2310.04378 • Published • 22 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83