W8A8-FP Qwen/Qwen3-8B model

Model Performance

# Perplexity (ppl) command
lm_eval --model hf   --model_args pretrained=Qwen/Qwen3-8B   --tasks mmlu   --device cuda:0   --batch_size 8   --limit 100

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7542	±	0.0055
- humanities	2	none	acc	↑	0.7577	±	0.0112
- other	2	none	acc	↑	0.7408	±	0.0116
- social sciences	2	none	acc	↑	0.8333	±	0.0105
- stem	2	none	acc	↑	0.7111	±	0.0101

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7498	±	0.0055
- humanities	2	none	acc	↑	0.7508	±	0.0114
- other	2	none	acc	↑	0.7392	±	0.0116
- social sciences	2	none	acc	↑	0.8267	±	0.0107
- stem	2	none	acc	↑	0.7079	±	0.0101

Benchmark
	Qwen/Qwen3-8B	namgyu-youn/Qwen3-8B-W8A8-FP
mmlu	0.7542	0.7498

vllm bench throughput --model Qwen/Qwen3-8B --input-len 256 --output-len 256 --num-prompts 100

vllm bench throughput --model namgyu-youn/Qwen3-8B-W8A8-FP --input-len 256 --output-len 256 --num-prompts 100

Benchmark
	Qwen/Qwen3-8B	namgyu-youn/Qwen3-8B-W8A8-FP
Throughput (tok/s)	-	-

vllm bench latency --model Qwen/Qwen3-8B --input-len 256 --output-len 256 --batch-size 1

vllm bench latency --model namgyu-youn/Qwen3-8B-W8A8-FP --input-len 256 --output-len 256 --batch-size 1

Benchmark
	Qwen/Qwen3-8B	namgyu-youn/Qwen3-8B-W8A8-FP
Latency (ms)	-	-

Base model

Finetuned

Quantized

(196)

this model