YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen3-235B-A22B-EAGLE3 (Speculators Format)

This is a conversion of lmsys/Qwen3-235B-A22B-EAGLE3 to the vLLM speculators format for use with Eagle3 speculative decoding.

Model Details

  • Base Model: Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
  • Draft Model Architecture: Llama-based Eagle3 head
  • Original Model: lmsys/Qwen3-235B-A22B-EAGLE3
  • Format: vLLM Speculators v0.1.0.dev42

Model Configuration

  • Draft Vocabulary Size: 32,000
  • Target Vocabulary Size: 151,936
  • Hidden Size: 4,096
  • Intermediate Size: 24,576
  • Number of Layers: 1 (Eagle3 head layer)
  • Attention Heads: 64
  • KV Heads: 4
  • Auxiliary Hidden State Layers: [1, 46, 90]

Usage

This model is designed to be used with vLLM's Eagle3 speculative decoding implementation:

from vllm import LLM

llm = LLM(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
    speculative_config={
        "method": "eagle3",
        "model": "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys",
        "num_speculative_tokens": 3,
    },
    tensor_parallel_size=2,
)

Or via command line:

python examples/offline_inference/spec_decode.py \
  --method "eagle3" \
  --tp 2 \
  --model-dir "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8" \
  --eagle-dir "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys" \
  --num-spec-tokens 3

Conversion Details

The original Eagle3 config format has been converted to the vLLM speculators format with the following changes:

  1. Architecture: Changed from LlamaForCausalLMEagle3 to Eagle3Speculator
  2. Config Structure: Reorganized into transformer_layer_config and speculators_config sections
  3. Auxiliary Layers: Extracted from eagle_config.eagle_aux_hidden_state_layer_ids to top-level eagle_aux_hidden_state_layer_ids
  4. Verifier Config: Added explicit verifier model specification

Files

  • config.json: Model configuration in speculators format
  • model.safetensors: Model weights (unchanged from original)

Citation

If you use this model, please cite the original Eagle3 paper and the LMSYS team:

@article{li2024eagle,
  title={EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees},
  author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
  journal={arXiv preprint arXiv:2406.16858},
  year={2024}
}

License

Same as the original model: lmsys/Qwen3-235B-A22B-EAGLE3

Downloads last month
6
Safetensors
Model size
1B params
Tensor type
I64
BF16
BOOL
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support