ACE-Step 1.5 GGUF

Pre-quantized GGUF models for acestep.cpp, a portable C++17 implementation of the ACE-Step 1.5 music generation pipeline using GGML.

Text + lyrics in, stereo 48kHz WAV out. Runs on CPU, CUDA, Metal, Vulkan.

Quick start

git clone --recurse-submodules https://github.com/ServeurpersoCom/acestep.cpp
cd acestep.cpp

pip install huggingface_hub
./models.sh           # downloads Q8_0 turbo essentials (~7.7 GB)

mkdir build && cd build
cmake .. -DGGML_CUDA=ON
cmake --build . --config Release -j$(nproc)
cd ..

cat > /tmp/request.json << 'EOF'
{
    "caption": "Upbeat pop rock with driving guitars and catchy hooks",
    "inference_steps": 8,
    "shift": 3.0,
    "vocal_language": "fr"
}
EOF

# LLM: generate lyrics + audio codes
./build/ace-qwen3 \
    --request /tmp/request.json \
    --model models/acestep-5Hz-lm-4B-Q8_0.gguf

# DiT + VAE: synthesize audio
./build/dit-vae \
    --request /tmp/request0.json \
    --text-encoder models/Qwen3-Embedding-0.6B-Q8_0.gguf \
    --dit models/acestep-v15-turbo-Q8_0.gguf \
    --vae models/vae-BF16.gguf

Download options

./models.sh                # Q8_0 turbo essentials (~7.7 GB)
./models.sh --all          # every model, every quant (~97 GB)
./models.sh --quant BF16   # full precision
./models.sh --quant Q6_K   # pick a quant
./models.sh --sft          # add SFT DiT variant
./models.sh --shifts       # add shift1/shift3/continuous variants
./models.sh --lm 0.6B      # smaller LM (fast, lower quality)

Or download individual files manually from the files tab.

Available models

Text encoder

File Quant Size
Qwen3-Embedding-0.6B-BF16.gguf BF16 1.2 GB
Qwen3-Embedding-0.6B-Q8_0.gguf Q8_0 748 MB

LM (Qwen3 causal, audio code generation)

File Params Quant Size
acestep-5Hz-lm-4B-BF16.gguf 4B BF16 7.9 GB
acestep-5Hz-lm-4B-Q8_0.gguf 4B Q8_0 4.2 GB
acestep-5Hz-lm-4B-Q6_K.gguf 4B Q6_K 3.3 GB
acestep-5Hz-lm-4B-Q5_K_M.gguf 4B Q5_K_M 2.9 GB
acestep-5Hz-lm-1.7B-BF16.gguf 1.7B BF16 3.5 GB
acestep-5Hz-lm-1.7B-Q8_0.gguf 1.7B Q8_0 1.9 GB
acestep-5Hz-lm-0.6B-BF16.gguf 0.6B BF16 1.3 GB
acestep-5Hz-lm-0.6B-Q8_0.gguf 0.6B Q8_0 677 MB

Small LMs (0.6B/1.7B) only have BF16 + Q8_0 (too small for aggressive quantization). The 4B LM does not have Q4_K_M (breaks audio code generation).

DiT (flow matching diffusion transformer)

Available for all 6 variants: turbo, sft, base, turbo-shift1, turbo-shift3, turbo-continuous.

Quant Size per variant
BF16 4.5 GB
Q8_0 2.4 GB
Q6_K 1.9 GB
Q5_K_M 1.6 GB
Q4_K_M 1.4 GB

Turbo preset: 8 steps, no CFG. SFT/Base preset: 32-50 steps, CFG 7.0.

VAE

File Size
vae-BF16.gguf 322 MB

Always BF16 (small, bandwidth-bound, quality-critical).

Pipeline

ace-qwen3 (Qwen3 causal LM, 0.6B/1.7B/4B)
  Phase 1 (if needed): CoT generates bpm, keyscale, timesignature, lyrics
  Phase 2: audio codes (5Hz tokens, FSQ vocabulary)
  Both phases batched: N sequences per forward, weights read once
  CFG with dual KV cache per batch element (cond + uncond)
  Output: request0.json .. requestN-1.json

dit-vae
  BPE tokenize
  Qwen3-Embedding (28L text encoder)
  CondEncoder (lyric 8L + timbre 4L + text_proj)
  FSQ detokenizer (audio codes -> source latents)
  DiT (24L flow matching, Euler steps)
  VAE (AutoencoderOobleck, tiled decode)
  WAV stereo 48kHz

Both stages support batching (--batch N) for parallel generation. LM batching produces different songs, DiT batching produces subtle variations of the same piece (different initial noise).

Acknowledgements

Independent C++/GGML implementation based on ACE-Step 1.5 by ACE Studio and StepFun. All model weights are theirs, this is a native inference backend.

Links

Downloads last month
1,860
GGUF
Model size
0.7B params
Architecture
acestep-lm
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Serveurperso/ACE-Step-1.5-GGUF

Quantized
(2)
this model