|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# ArcticSpeculator |
|
|
|
|
|
Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)! |
|
|
|
|
|
<!--We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below: |
|
|
|
|
|
| method | ShareGPT | HumanEval | |
|
|
|--------------------------------------|----------------|--------------| |
|
|
| VLLM V1 Baseline | 84.1 | 84.1 | |
|
|
| VLLM V1 Eagle | 102.2 | 112.0 | |
|
|
| VLLM V1 Eagle3 | 77.7 | 85.3 | |
|
|
| VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 | |
|
|
| ArcticSpeculator | **172.4** | **203.7** | |
|
|
--> |
|
|
|
|
|
For more details about ArcticSpeculator and how to use it: |
|
|
|
|
|
* โ๏ธ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]() |
|
|
* ๐ [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator) |
|
|
|
|
|
See all of the speculators we have released via our [Speculators Collection](https://huggingface.co/collections/Snowflake/speculators-6812b07f3186d13e243022e4) |