--- license: apache-2.0 tags: - llmcompressor - qwen - quantized - aworld - moe - qwen3 - 8-bit - 8bit - int8 - gptq - awq - fp8 - a8w8 model-index: - name: Qwen3-32B-AWorld-W8A16 results: [] language: - en - zh base_model: - inclusionAI/Qwen3-32B-AWorld --- # Qwen3-32B-AWorld-W8A16 This is a W8A16 (8-bit weight, 16-bit activation) quantized version of the [inclusionAI/Qwen3-32B-AWorld](https://huggingface.co/inclusionAI/Qwen3-32B-AWorld) model, created using [LLM Compressor](https://github.com/vllm/llm-compressor). ## Model Details - **Model Type**: Mixture-of-Experts (MoE) Language Model - **Base Model**: [inclusionAI/Qwen3-32B-AWorld](https://huggingface.co/inclusionAI/Qwen3-32B-AWorld) - **Quantization Method**: GPTQ with SmoothQuant preprocessing - **Weight Precision**: 8-bit - **Activation Precision**: 16-bit - **Compression Ratio**: ~2x model size reduction ## Quantization Process This model was quantized using the LLM Compressor library with the following key parameters: - Algorithm: GPTQ with SmoothQuant preprocessing - Protection: MoE gate layers kept at full precision - Calibration: 512 samples from ultrachat_200k dataset - Sequence Length: 2048 tokens The quantization process preserves the quality of the original model while reducing its size by approximately 2x, making it more suitable for deployment on resource-constrained environments. ## Usage The model can be loaded using the standard Hugging Face transformers library: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("groxaxo/Qwen3-32B-AWorld-W8A16", device_map="auto") tokenizer = AutoTokenizer.from_pretrained("groxaxo/Qwen3-32B-AWorld-W8A16") ``` ## Original Model This quantized model is derived from [inclusionAI/Qwen3-32B-AWorld](https://huggingface.co/inclusionAI/Qwen3-32B-AWorld). Please refer to the original model card for detailed information about the base model's capabilities, training process, and intended use. ## License This model is licensed under the Apache 2.0 license, inheriting the license from the original model.