ACE-Step 1.5 GGUF
Pre-quantized GGUF models for acestep.cpp, a portable C++17 implementation of the ACE-Step 1.5 music generation pipeline using GGML.
Text + lyrics in, stereo 48kHz WAV out. Runs on CPU, CUDA, Metal, Vulkan.
Quick start
git clone --recurse-submodules https://github.com/ServeurpersoCom/acestep.cpp
cd acestep.cpp
pip install huggingface_hub
./models.sh # downloads Q8_0 turbo essentials (~7.7 GB)
mkdir build && cd build
cmake .. -DGGML_CUDA=ON
cmake --build . --config Release -j$(nproc)
cd ..
cat > /tmp/request.json << 'EOF'
{
"caption": "Upbeat pop rock with driving guitars and catchy hooks",
"inference_steps": 8,
"shift": 3.0,
"vocal_language": "fr"
}
EOF
# LLM: generate lyrics + audio codes
./build/ace-qwen3 \
--request /tmp/request.json \
--model models/acestep-5Hz-lm-4B-Q8_0.gguf
# DiT + VAE: synthesize audio
./build/dit-vae \
--request /tmp/request0.json \
--text-encoder models/Qwen3-Embedding-0.6B-Q8_0.gguf \
--dit models/acestep-v15-turbo-Q8_0.gguf \
--vae models/vae-BF16.gguf
Download options
./models.sh # Q8_0 turbo essentials (~7.7 GB)
./models.sh --all # every model, every quant (~97 GB)
./models.sh --quant BF16 # full precision
./models.sh --quant Q6_K # pick a quant
./models.sh --sft # add SFT DiT variant
./models.sh --shifts # add shift1/shift3/continuous variants
./models.sh --lm 0.6B # smaller LM (fast, lower quality)
Or download individual files manually from the files tab.
Available models
Text encoder
| File | Quant | Size |
|---|---|---|
| Qwen3-Embedding-0.6B-BF16.gguf | BF16 | 1.2 GB |
| Qwen3-Embedding-0.6B-Q8_0.gguf | Q8_0 | 748 MB |
LM (Qwen3 causal, audio code generation)
| File | Params | Quant | Size |
|---|---|---|---|
| acestep-5Hz-lm-4B-BF16.gguf | 4B | BF16 | 7.9 GB |
| acestep-5Hz-lm-4B-Q8_0.gguf | 4B | Q8_0 | 4.2 GB |
| acestep-5Hz-lm-4B-Q6_K.gguf | 4B | Q6_K | 3.3 GB |
| acestep-5Hz-lm-4B-Q5_K_M.gguf | 4B | Q5_K_M | 2.9 GB |
| acestep-5Hz-lm-1.7B-BF16.gguf | 1.7B | BF16 | 3.5 GB |
| acestep-5Hz-lm-1.7B-Q8_0.gguf | 1.7B | Q8_0 | 1.9 GB |
| acestep-5Hz-lm-0.6B-BF16.gguf | 0.6B | BF16 | 1.3 GB |
| acestep-5Hz-lm-0.6B-Q8_0.gguf | 0.6B | Q8_0 | 677 MB |
Small LMs (0.6B/1.7B) only have BF16 + Q8_0 (too small for aggressive quantization). The 4B LM does not have Q4_K_M (breaks audio code generation).
DiT (flow matching diffusion transformer)
Available for all 6 variants: turbo, sft, base, turbo-shift1, turbo-shift3, turbo-continuous.
| Quant | Size per variant |
|---|---|
| BF16 | 4.5 GB |
| Q8_0 | 2.4 GB |
| Q6_K | 1.9 GB |
| Q5_K_M | 1.6 GB |
| Q4_K_M | 1.4 GB |
Turbo preset: 8 steps, no CFG. SFT/Base preset: 32-50 steps, CFG 7.0.
VAE
| File | Size |
|---|---|
| vae-BF16.gguf | 322 MB |
Always BF16 (small, bandwidth-bound, quality-critical).
Pipeline
ace-qwen3 (Qwen3 causal LM, 0.6B/1.7B/4B)
Phase 1 (if needed): CoT generates bpm, keyscale, timesignature, lyrics
Phase 2: audio codes (5Hz tokens, FSQ vocabulary)
Both phases batched: N sequences per forward, weights read once
CFG with dual KV cache per batch element (cond + uncond)
Output: request0.json .. requestN-1.json
dit-vae
BPE tokenize
Qwen3-Embedding (28L text encoder)
CondEncoder (lyric 8L + timbre 4L + text_proj)
FSQ detokenizer (audio codes -> source latents)
DiT (24L flow matching, Euler steps)
VAE (AutoencoderOobleck, tiled decode)
WAV stereo 48kHz
Both stages support batching (--batch N) for parallel generation.
LM batching produces different songs, DiT batching produces subtle
variations of the same piece (different initial noise).
Acknowledgements
Independent C++/GGML implementation based on ACE-Step 1.5 by ACE Studio and StepFun. All model weights are theirs, this is a native inference backend.
Links
- acestep.cpp - source code
- ACE-Step 1.5 - original Python implementation
- ACE-Step model hub - original weights
- Downloads last month
- 1,860
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for Serveurperso/ACE-Step-1.5-GGUF
Base model
ACE-Step/Ace-Step1.5