wenhua cheng
wenhuach
AI & ML interests
Model Compression, CV
Recent Activity
new activity
7 days ago
Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound:Practical performance feedback
reacted
to
their
post
with π
9 days ago
π AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang β bringing faster and more flexible deployment to your LLM workflows.
π‘ Weβve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
β Star our repo and stay tuned for more exciting updates!
new activity
10 days ago
Intel/Mistral-Small-3.2-24B-Instruct-2506-int4-AutoRound:Works good with vLLM, just no tool calling