-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Paper • 2403.15246 • Published • 11 -
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14469
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper • 2401.00448 • Published • 30 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 53 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 46 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24
-
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 24 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 50 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 22
-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Paper • 2403.15246 • Published • 11 -
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 53 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 46 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 24 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 50 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 22
-
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper • 2401.00448 • Published • 30 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82