When does the GGUF version get released?

by PlayAI - opened 28 days ago

Discussion

PlayAI

28 days ago

When does the GGUF version get released?

huihui-ai

Owner 24 days ago

https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated/tree/main/GGUF

rockiecxh

21 days ago

https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated/tree/main/GGUF

can this gguf be uploaded to ollama? because I failed to create custom model with the gguf and mmproj files on my local.

huihui-ai

Owner 21 days ago

https://x.com/support_huihui/status/1983912323548115291

MikaSouthworth

21 days ago

•

edited 21 days ago

https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated/tree/main/GGUF

can this gguf be uploaded to ollama? because I failed to create custom model with the gguf and mmproj files on my local.

this stuff does not work. it may work if you run it on windows and get lucky that your system randomly supports this hack job of an implementation. it's not huihui's fault. and I can't blame the llama.cpp's forks for trying. but it's too early. I wasted so many hours on this. it's useless. wait until the main official repo implements it, because they at least know what they are doing. just use the censored FP8 version until this is properly implemented and use sglang or vllm to run it. they do not need that much vram. ... Qwen's official FP8 quant runs this at 16.6GB VRAM. not just server launch but when you use it too. I doubt the gguf will perform this efficiently. once the visual side works as well as in sglang, then it makes sense to use ggufs... but before... why even waste the time when in a few days or next week it's in the official repo anyway; a note on the FP8 version: it runs SLOWER than the bf16 version of the same model. but it needs less VRAM. so I am waiting for real gguf support and a model that's not censored
python -m sglang.launch_server
--model "$MODEL_PATH"
--host 127.0.0.1
--port 30000
--trust-remote-code
--context-length 90000
--mem-fraction-static 0.55
--tp 1
--enable-multimodal
--chunked-prefill-size 4096
--attention-backend fa3
--moe-runner-backend auto

rockiecxh

20 days ago

https://x.com/support_huihui/status/1983912323548115291

Thank you very much, the solution works perfectly. I managed to convert one into ollama.

huihui-ai

Owner 20 days ago

https://ollama.com/huihui_ai/qwen3-vl-abliterated

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment