Ctranslate2 compatable AWQ 4-bit quantization of Qwen/Qwen3-14B.

NOTES:

This model requires that a pull request https://github.com/OpenNMT/CTranslate2/pull/1951 be accepted. The model will only work after this occurs.

This model was made from that AWQ version of the original Qwen3 model. The AWQ version, in turn, was made with custom fork of the autoawq repository since the original repository was archived in May, 2025. Feel free to message if you run into any issues, but it has been tested.

VRAM Usage:

Model	VRAM Usage
Qwen3-32B-ct2-awq	~18.3 GB
Qwen3-14B-ct2-awq	~9.5 GB
Qwen3-8B-ct2-awq	~5.8 GB
Qwen3-4B-ct2-awq	~2.6 GB
Qwen3-1.7B-ct2-awq	~1.3 GB
Qwen3-0.6B-ct2-awq	~0.6 GB

Example Usage:

import ctranslate2
from transformers import AutoTokenizer

MODEL_ID = "CTranslate2HQ/Qwen3-14B-ct2-AWQ"

# Load model and tokenizer from Hugging Face Hub
generator = ctranslate2.Generator(MODEL_ID, device="cuda")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# Format prompt using chat template
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Write a short poem about a cat."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)

# Tokenize and generate
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))

# Only with "ct2-AWQ" models do not use the "compute_type" parameter
results = generator.generate_batch(
    [tokens],
    max_length=8192,
    sampling_temperature=0.7,
    sampling_topk=50,
)

# Decode and print response
output_ids = results[0].sequences_ids[0]
response = tokenizer.decode(output_ids, skip_special_tokens=True)
print(response)

Requirements:

ctranslate2
transformers
torch
huggingface_hub

Downloads last month: 28

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CTranslate2HQ/Qwen3-14B-ct2-AWQ

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Quantized

(141)

this model

Collection including CTranslate2HQ/Qwen3-14B-ct2-AWQ

Qwen3

Collection

Qwen3 models converted to Ctranslate2 format. • 8 items • Updated 6 days ago