⚠️ Note

This model had issues with its original Chat Template, which prevented it from functioning properly.
In this GGUF Quantization, all tool-use–related sections have been removed, but the model can run in basic chat mode
.

  • General text generation and conversation: working properly.
  • Function calling and external tool integration: currently disabled.

If you know how to improve the Chat Template, please open a new Discussion to share your insights.

Lumia101/EXAONE-4.0.1-32B-GGUF-Q4_K_M

This model is converted to GGUF format from LGAI-EXAONE/EXAONE-4.0.1-32B using llama.cpp release b6795.

Original model card: LGAI-EXAONE/EXAONE-4.0.1-32B

(I wanted to make other versions besides Q4_K_M, but I didn't have time because I'm a high school student...)

How to use this model

Please make sure that the environment you are trying to run this model has at least 24GB of VRAM.

  1. Install llama-cpp-python with this command.
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
  1. Use this code to use this model.
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="Lumia101/EXAONE-4.0.1-32B-GGUF-Q4_K_M",
    filename="EXAONE-4.0.1-Q4_K_M-ctemplate-removed.gguf",
    n_gpu_layers=-1,
    n_ctx=8192,
    verbose=False
)

prompt = "Tell me the reason why I need GPU to run a language model." # If you would like to ask this model another question, please edit it here.

output_stream = llm.create_chat_completion(
    messages = [
        {
            "role": "user",
            "content": prompt
        }
    ],
    temperature=0.6,
    top_p=0.95,
    presence_penalty=1.5,
    stream=True
)

for chunk in output_stream:
    content = chunk.get('choices', [{}])[0].get('delta', {}).get('content', '')
    
    if content:
        print(content, end='', flush=True)
        
print()
Downloads last month
351
GGUF
Model size
32B params
Architecture
exaone4
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lumia101/EXAONE-4.0.1-32B-GGUF-Q4_K_M

Quantized
(6)
this model

Collection including Lumia101/EXAONE-4.0.1-32B-GGUF-Q4_K_M