--- license: apache-2.0 base_model: - Qwen/Qwen3-Reranker-8B pipeline_tag: text-classification tags: - transformers --- # Qwen3-Reranker-8B-W4A16-G128 GPTQ Quantized [Qwen/Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B) with Ultrachat, [THUIR/T2Ranking](https://huggingface.co/datasets/THUIR/T2Ranking) and [m-a-p/COIG-CQIA](huggingface.co/datasets/m-a-p/COIG-CQIA) for calibration set. ## What's the benefit? VRAM Usage: more than 24G -> `19624M`, make it available on 3090/4090. (w/o FA2, according to Embedding model's result). ## What's the cost? I think `<5%` accuracy, further evaluation on the way... [The Embedding one](https://huggingface.co/boboliu/Qwen3-Embedding-4B-W4A16-G128#whats-the-cost) shows `~0.7%`. ## How to use it? `pip install compressed-tensors optimum` and `auto-gptq` / `gptqmodel`, then goto [the official usage guide](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B#transformers-usage).