Snowflake
/

Arctic-LSTM-Speculator-gpt-oss-20b

Model card Files Files and versions

Arctic-LSTM-Speculator-gpt-oss-20b / README.md

aurick's picture

Update README.md (#1)

6bc8ba0 verified 2 months ago

|

history blame contribute delete

1.31 kB

	---
	license: apache-2.0
	---

	# ArcticSpeculator

	Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!

	<!--We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:

	\| method \| ShareGPT \| HumanEval \|
	\|--------------------------------------\|----------------\|--------------\|
	\| VLLM V1 Baseline \| 84.1 \| 84.1 \|
	\| VLLM V1 Eagle \| 102.2 \| 112.0 \|
	\| VLLM V1 Eagle3 \| 77.7 \| 85.3 \|
	\| VLLM V0 MLP-Speculator (IBM) \| 77.9 \| 66.7 \|
	\| ArcticSpeculator \| 172.4 \| 203.7 \|
	-->

	For more details about ArcticSpeculator and how to use it:

	* ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
	* 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator)

	See all of the speculators we have released via our [Speculators Collection](https://huggingface.co/collections/Snowflake/speculators-6812b07f3186d13e243022e4)