Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.00531

Inference Acceleration

BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

Paper • 2401.12522 • Published Jan 23, 2024 • 12
Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Paper • 2402.02834 • Published Feb 5, 2024 • 17

Inference Acceleration

BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

Paper • 2401.12522 • Published Jan 23, 2024 • 12
Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Paper • 2402.02834 • Published Feb 5, 2024 • 17

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs