Ctranslate2 compatable AWQ 4-bit quantization of Qwen/Qwen3-14B.

NOTES:

  • This model requires that a pull request https://github.com/OpenNMT/CTranslate2/pull/1951 be accepted. The model will only work after this occurs.
  • This model was made from that AWQ version of the original Qwen3 model. The AWQ version, in turn, was made with custom fork of the autoawq repository since the original repository was archived in May, 2025. Feel free to message if you run into any issues, but it has been tested.

VRAM Usage:

Model VRAM Usage
Qwen3-32B-ct2-awq ~18.3 GB
Qwen3-14B-ct2-awq ~9.5 GB
Qwen3-8B-ct2-awq ~5.8 GB
Qwen3-4B-ct2-awq ~2.6 GB
Qwen3-1.7B-ct2-awq ~1.3 GB
Qwen3-0.6B-ct2-awq ~0.6 GB

Example Usage:

import ctranslate2
from transformers import AutoTokenizer

MODEL_ID = "CTranslate2HQ/Qwen3-14B-ct2-AWQ"

# Load model and tokenizer from Hugging Face Hub
generator = ctranslate2.Generator(MODEL_ID, device="cuda")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# Format prompt using chat template
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Write a short poem about a cat."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)

# Tokenize and generate
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))

# Only with "ct2-AWQ" models do not use the "compute_type" parameter
results = generator.generate_batch(
    [tokens],
    max_length=8192,
    sampling_temperature=0.7,
    sampling_topk=50,
)

# Decode and print response
output_ids = results[0].sequences_ids[0]
response = tokenizer.decode(output_ids, skip_special_tokens=True)
print(response)

Requirements:

ctranslate2
transformers
torch
huggingface_hub
Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CTranslate2HQ/Qwen3-14B-ct2-AWQ

Finetuned
Qwen/Qwen3-14B
Quantized
(141)
this model

Collection including CTranslate2HQ/Qwen3-14B-ct2-AWQ