ACE-Step 1.5 XL — Task Arithmetic Merged DiT Models (GGUF)

Quantized GGUF versions of task-arithmetic merged ACE-Step v1.5 XL DiT models, ready for use with C++ inference engines.

What Are These?

These are DiT (Diffusion Transformer) models created by merging the official ACE-Step v1.5 XL checkpoints using task arithmetic. Task arithmetic blends the learned "task vectors" of two fine-tuned models at a controlled interpolation ratio, producing models that inherit qualities from both parents.

Each model blends two of the three official XL checkpoints:

Model Parent A Parent B Ratio (λ) Character
merge-sft-turbo-xl-ta-0.3 XL-SFT XL-Turbo 0.3 Mostly SFT with a touch of Turbo speed
merge-sft-turbo-xl-ta-0.7 XL-SFT XL-Turbo 0.7 Mostly Turbo with SFT musicality
merge-base-turbo-xl-ta-0.5 XL-Base XL-Turbo 0.5 Equal blend of Base and Turbo

λ = 0 means pure Parent A, λ = 1 means pure Parent B.

Why Merge?

  • SFT × Turbo blends combine SFT's strong lyric adherence and musical structure with Turbo's faster convergence and energy
  • Base × Turbo blends bring Base's raw generative range together with Turbo's efficiency
  • Different ratios let you dial the trade-off to taste — lower λ for more structure, higher λ for more speed

Architecture

All three models share the XL architecture:

Parameter Value
Architecture AceStepConditionGenerationModel
Hidden size 2560
Intermediate size 9728
Attention heads 32 (8 KV heads, GQA)
Layers 32 (alternating sliding + full attention)
Encoder hidden size 2048
Head dim 128
Context length 32768
Parameters ~4.7B
is_turbo false (uses base-mode scheduling)

Available Quantizations

Each model is provided in 5 quantization levels:

Quantization Size Notes
BF16 9,516 MB Full precision, reference quality
Q8_0 5,060 MB Near-lossless, recommended for quality
Q6_K 3,909 MB Excellent quality, good VRAM savings
Q5_K_M 3,364 MB Great balance of quality and size
Q4_K_M 2,851 MB Smallest, some quality trade-off

File Listing

acestep-v15-merge-base-turbo-xl-ta-0.5-BF16.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q4_K_M.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q5_K_M.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q6_K.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q8_0.gguf

acestep-v15-merge-sft-turbo-xl-ta-0.3-BF16.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q4_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q5_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q6_K.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q8_0.gguf

acestep-v15-merge-sft-turbo-xl-ta-0.7-BF16.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q4_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q5_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q6_K.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q8_0.gguf

Compatibility

These GGUF files are DiT-only — they replace the --dit argument in the inference pipeline. You still need the standard LM, text encoder, and VAE models alongside them.

acestep.cpp

Drop any of these GGUF files into the models/ directory and pass via --dit:

./build/ace-server \
    --host 0.0.0.0 --port 8090 \
    --lm models/acestep-5Hz-lm-4B-Q8_0.gguf \
    --embedding models/Qwen3-Embedding-0.6B-Q8_0.gguf \
    --dit models/acestep-v15-merge-sft-turbo-xl-ta-0.3-Q8_0.gguf \
    --vae models/vae-BF16.gguf

HOT-Step-CPP

Place the GGUF files in the engine's models/ directory. The model will appear in the DiT model dropdown in the web UI automatically.

Required Companion Models

These DiT GGUFs must be used alongside:

Component Model Notes
LM acestep-5Hz-lm-4B-Q8_0.gguf Audio code language model
Text Encoder Qwen3-Embedding-0.6B-Q8_0.gguf Caption encoder
VAE vae-BF16.gguf Audio decoder (always BF16)

Companion models are available from Serveurperso/ACE-Step-1.5-GGUF.

Recommended Settings

Since these are non-turbo (is_turbo: false) merged models, they use base-mode scheduling:

Parameter Recommended
Inference steps 60–100
CFG scale 3.0–7.0
Guidance mode apg or cfg
Duration 30–180s

Note: Higher step counts are needed compared to turbo models. These models trade speed for quality and creative range.

How These Were Made

  1. Source checkpoints: Official ACE-Step v1.5 XL safetensors (Base, SFT, Turbo)
  2. Merge method: Task arithmetic — θ_merged = θ_A + λ(θ_B − θ_A)
  3. Conversion: Safetensors → GGUF BF16 via convert.py from acestep.cpp
  4. Quantization: BF16 → Q4_K_M / Q5_K_M / Q6_K / Q8_0 via the acestep.cpp quantize tool

License

These models inherit the license from the upstream ACE-Step v1.5 checkpoints. See the ACE-Step repository for details.

Credits

  • ACE-Step — Original model architecture and training by the ACE-Step team
  • acestep.cpp — C++ inference engine and GGUF tooling by Serveurperso
  • HOT-Step-CPP — Full-stack music generation app by scragnog
  • Task arithmetic merges — Produced by scragnog
Downloads last month
486
GGUF
Model size
5B params
Architecture
acestep-dit
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for scragnog/ace-step-1.5-gguf-merge-models