CUDA error when prompt start processing

by illeniumx - opened Sep 28

Sep 28

Idk why, but this error stands for all model quantization

/deploy/ai/ik_llama.cpp/ggml/src/ggml-cuda.cu:119: CUDA error
CUDA error: an illegal memory access was encountered
current device: 0, in function launch_mul_mat_q at /deploy/ai/ik_llama.cpp/ggml/src/ggml-cuda/template-instances/../mmq.cuh:4122
cudaFuncSetAttribute(mul_mat_q<type, mmq_x, 8, false>, cudaFuncAttributeMaxDynamicSharedMemorySize, shmem)
/deploy/ai/ik_llama.cpp/ggml/src/ggml-cuda.cu:119: CUDA error

ubergarm

Owner Sep 28

@illeniumx

Hrmm, you'll have to give more information including:

what GPU(s) do you have (are you trying to use multiple older P40s for example, as those might have issues with many quants)
what OS (e.g. Linux and kernel version, or windows, CUDA version and driver, etc)
what is exact command you're using to try to start (e.g. are you using --jinja or not, -fmoe or not, etc)

My hunch is you are trying to use -ot to put (gate|up) tensors across two different GPUs and still trying to use -fmoe fused moe ops or similar which might cause this issue.

The quick thing to try is to run without -fmoe and also to add --no-fused-up-gate to see if that works. Generally you want to use -fmoe and not use --no-fused-up-gate for speed-up but will need to adjust your -ot.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment