ACE-Step 1.5 XL — Task Arithmetic Merged DiT Models (GGUF)
Quantized GGUF versions of task-arithmetic merged ACE-Step v1.5 XL DiT models, ready for use with C++ inference engines.
What Are These?
These are DiT (Diffusion Transformer) models created by merging the official ACE-Step v1.5 XL checkpoints using task arithmetic. Task arithmetic blends the learned "task vectors" of two fine-tuned models at a controlled interpolation ratio, producing models that inherit qualities from both parents.
Each model blends two of the three official XL checkpoints:
| Model | Parent A | Parent B | Ratio (λ) | Character |
|---|---|---|---|---|
merge-sft-turbo-xl-ta-0.3 |
XL-SFT | XL-Turbo | 0.3 | Mostly SFT with a touch of Turbo speed |
merge-sft-turbo-xl-ta-0.7 |
XL-SFT | XL-Turbo | 0.7 | Mostly Turbo with SFT musicality |
merge-base-turbo-xl-ta-0.5 |
XL-Base | XL-Turbo | 0.5 | Equal blend of Base and Turbo |
λ = 0 means pure Parent A, λ = 1 means pure Parent B.
Why Merge?
- SFT × Turbo blends combine SFT's strong lyric adherence and musical structure with Turbo's faster convergence and energy
- Base × Turbo blends bring Base's raw generative range together with Turbo's efficiency
- Different ratios let you dial the trade-off to taste — lower λ for more structure, higher λ for more speed
Architecture
All three models share the XL architecture:
| Parameter | Value |
|---|---|
| Architecture | AceStepConditionGenerationModel |
| Hidden size | 2560 |
| Intermediate size | 9728 |
| Attention heads | 32 (8 KV heads, GQA) |
| Layers | 32 (alternating sliding + full attention) |
| Encoder hidden size | 2048 |
| Head dim | 128 |
| Context length | 32768 |
| Parameters | ~4.7B |
is_turbo |
false (uses base-mode scheduling) |
Available Quantizations
Each model is provided in 5 quantization levels:
| Quantization | Size | Notes |
|---|---|---|
| BF16 | 9,516 MB | Full precision, reference quality |
| Q8_0 | 5,060 MB | Near-lossless, recommended for quality |
| Q6_K | 3,909 MB | Excellent quality, good VRAM savings |
| Q5_K_M | 3,364 MB | Great balance of quality and size |
| Q4_K_M | 2,851 MB | Smallest, some quality trade-off |
File Listing
acestep-v15-merge-base-turbo-xl-ta-0.5-BF16.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q4_K_M.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q5_K_M.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q6_K.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q8_0.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-BF16.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q4_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q5_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q6_K.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q8_0.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-BF16.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q4_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q5_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q6_K.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q8_0.gguf
Compatibility
These GGUF files are DiT-only — they replace the --dit argument in the inference pipeline. You still need the standard LM, text encoder, and VAE models alongside them.
acestep.cpp
Drop any of these GGUF files into the models/ directory and pass via --dit:
./build/ace-server \
--host 0.0.0.0 --port 8090 \
--lm models/acestep-5Hz-lm-4B-Q8_0.gguf \
--embedding models/Qwen3-Embedding-0.6B-Q8_0.gguf \
--dit models/acestep-v15-merge-sft-turbo-xl-ta-0.3-Q8_0.gguf \
--vae models/vae-BF16.gguf
HOT-Step-CPP
Place the GGUF files in the engine's models/ directory. The model will appear in the DiT model dropdown in the web UI automatically.
Required Companion Models
These DiT GGUFs must be used alongside:
| Component | Model | Notes |
|---|---|---|
| LM | acestep-5Hz-lm-4B-Q8_0.gguf |
Audio code language model |
| Text Encoder | Qwen3-Embedding-0.6B-Q8_0.gguf |
Caption encoder |
| VAE | vae-BF16.gguf |
Audio decoder (always BF16) |
Companion models are available from Serveurperso/ACE-Step-1.5-GGUF.
Recommended Settings
Since these are non-turbo (is_turbo: false) merged models, they use base-mode scheduling:
| Parameter | Recommended |
|---|---|
| Inference steps | 60–100 |
| CFG scale | 3.0–7.0 |
| Guidance mode | apg or cfg |
| Duration | 30–180s |
Note: Higher step counts are needed compared to turbo models. These models trade speed for quality and creative range.
How These Were Made
- Source checkpoints: Official ACE-Step v1.5 XL safetensors (Base, SFT, Turbo)
- Merge method: Task arithmetic —
θ_merged = θ_A + λ(θ_B − θ_A) - Conversion: Safetensors → GGUF BF16 via
convert.pyfrom acestep.cpp - Quantization: BF16 → Q4_K_M / Q5_K_M / Q6_K / Q8_0 via the acestep.cpp
quantizetool
License
These models inherit the license from the upstream ACE-Step v1.5 checkpoints. See the ACE-Step repository for details.
Credits
- ACE-Step — Original model architecture and training by the ACE-Step team
- acestep.cpp — C++ inference engine and GGUF tooling by Serveurperso
- HOT-Step-CPP — Full-stack music generation app by scragnog
- Task arithmetic merges — Produced by scragnog
- Downloads last month
- 486
4-bit
5-bit
6-bit
8-bit
16-bit