ubergarm commited on
Commit
f956ece
·
1 Parent(s): a23307a

Hitup discussions for command details

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -47,8 +47,8 @@ Final estimate: PPL = 5.4403 +/- 0.03421 (wiki.test.raw, compare to Q8_0 at 5.31
47
  #### `ik_llama.cpp` API server for hybrid GPU+CPU inferencing
48
  ```bash
49
  # This example for 24GB VRAM + 96 GB RAM + 16 physical core CPU
50
- # Offload first ffn layers 0-11 on GPU VRAM.
51
- # Offload final ffn layers 12-93 on CPU RAM.
52
  ./build/bin/llama-server
53
  --model ubergarm/Qwen3-235B-A22B-GGUF/Qwen3-235B-A22B-mix-IQ3_K-00001-of-00003.gguf \
54
  --alias ubergarm/Qwen3-235B-A22B-mix-IQ3_K \
 
47
  #### `ik_llama.cpp` API server for hybrid GPU+CPU inferencing
48
  ```bash
49
  # This example for 24GB VRAM + 96 GB RAM + 16 physical core CPU
50
+ # Offload first ffn layers 0-9 on GPU VRAM.
51
+ # Leave remaining ffn layers on CPU RAM.
52
  ./build/bin/llama-server
53
  --model ubergarm/Qwen3-235B-A22B-GGUF/Qwen3-235B-A22B-mix-IQ3_K-00001-of-00003.gguf \
54
  --alias ubergarm/Qwen3-235B-A22B-mix-IQ3_K \