Hitup discussions for command details
Browse files
README.md
CHANGED
|
@@ -47,8 +47,8 @@ Final estimate: PPL = 5.4403 +/- 0.03421 (wiki.test.raw, compare to Q8_0 at 5.31
|
|
| 47 |
#### `ik_llama.cpp` API server for hybrid GPU+CPU inferencing
|
| 48 |
```bash
|
| 49 |
# This example for 24GB VRAM + 96 GB RAM + 16 physical core CPU
|
| 50 |
-
# Offload first ffn layers 0-
|
| 51 |
-
#
|
| 52 |
./build/bin/llama-server
|
| 53 |
--model ubergarm/Qwen3-235B-A22B-GGUF/Qwen3-235B-A22B-mix-IQ3_K-00001-of-00003.gguf \
|
| 54 |
--alias ubergarm/Qwen3-235B-A22B-mix-IQ3_K \
|
|
|
|
| 47 |
#### `ik_llama.cpp` API server for hybrid GPU+CPU inferencing
|
| 48 |
```bash
|
| 49 |
# This example for 24GB VRAM + 96 GB RAM + 16 physical core CPU
|
| 50 |
+
# Offload first ffn layers 0-9 on GPU VRAM.
|
| 51 |
+
# Leave remaining ffn layers on CPU RAM.
|
| 52 |
./build/bin/llama-server
|
| 53 |
--model ubergarm/Qwen3-235B-A22B-GGUF/Qwen3-235B-A22B-mix-IQ3_K-00001-of-00003.gguf \
|
| 54 |
--alias ubergarm/Qwen3-235B-A22B-mix-IQ3_K \
|