ACE-Step 1.5 XL — Task Arithmetic Merged DiT Models (GGUF)

Quantized GGUF versions of task-arithmetic merged ACE-Step v1.5 XL DiT models, ready for use with C++ inference engines.

What Are These?

These are DiT (Diffusion Transformer) models created by merging the official ACE-Step v1.5 XL checkpoints using task arithmetic. Task arithmetic blends the learned "task vectors" of two fine-tuned models at a controlled interpolation ratio, producing models that inherit qualities from both parents.

Each model blends two of the three official XL checkpoints:

Model	Parent A	Parent B	Ratio (λ)	Character
`merge-sft-turbo-xl-ta-0.3`	XL-SFT	XL-Turbo	0.3	Mostly SFT with a touch of Turbo speed
`merge-sft-turbo-xl-ta-0.7`	XL-SFT	XL-Turbo	0.7	Mostly Turbo with SFT musicality
`merge-base-turbo-xl-ta-0.5`	XL-Base	XL-Turbo	0.5	Equal blend of Base and Turbo

λ = 0 means pure Parent A, λ = 1 means pure Parent B.

Why Merge?

SFT × Turbo blends combine SFT's strong lyric adherence and musical structure with Turbo's faster convergence and energy
Base × Turbo blends bring Base's raw generative range together with Turbo's efficiency
Different ratios let you dial the trade-off to taste — lower λ for more structure, higher λ for more speed

Architecture

All three models share the XL architecture:

Parameter	Value
Architecture	`AceStepConditionGenerationModel`
Hidden size	2560
Intermediate size	9728
Attention heads	32 (8 KV heads, GQA)
Layers	32 (alternating sliding + full attention)
Encoder hidden size	2048
Head dim	128
Context length	32768
Parameters	~4.7B
`is_turbo`	`false` (uses base-mode scheduling)

Available Quantizations

Each model is provided in 5 quantization levels:

Quantization	Size	Notes
BF16	9,516 MB	Full precision, reference quality
Q8_0	5,060 MB	Near-lossless, recommended for quality
Q6_K	3,909 MB	Excellent quality, good VRAM savings
Q5_K_M	3,364 MB	Great balance of quality and size
Q4_K_M	2,851 MB	Smallest, some quality trade-off

File Listing

acestep-v15-merge-base-turbo-xl-ta-0.5-BF16.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q4_K_M.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q5_K_M.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q6_K.gguf
acestep-v15-merge-base-turbo-xl-ta-0.5-Q8_0.gguf

acestep-v15-merge-sft-turbo-xl-ta-0.3-BF16.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q4_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q5_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q6_K.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.3-Q8_0.gguf

acestep-v15-merge-sft-turbo-xl-ta-0.7-BF16.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q4_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q5_K_M.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q6_K.gguf
acestep-v15-merge-sft-turbo-xl-ta-0.7-Q8_0.gguf

Compatibility

These GGUF files are DiT-only — they replace the --dit argument in the inference pipeline. You still need the standard LM, text encoder, and VAE models alongside them.

acestep.cpp

Drop any of these GGUF files into the models/ directory and pass via --dit:

./build/ace-server \
    --host 0.0.0.0 --port 8090 \
    --lm models/acestep-5Hz-lm-4B-Q8_0.gguf \
    --embedding models/Qwen3-Embedding-0.6B-Q8_0.gguf \
    --dit models/acestep-v15-merge-sft-turbo-xl-ta-0.3-Q8_0.gguf \
    --vae models/vae-BF16.gguf

HOT-Step-CPP

Place the GGUF files in the engine's models/ directory. The model will appear in the DiT model dropdown in the web UI automatically.

Required Companion Models

These DiT GGUFs must be used alongside:

Component	Model	Notes
LM	`acestep-5Hz-lm-4B-Q8_0.gguf`	Audio code language model
Text Encoder	`Qwen3-Embedding-0.6B-Q8_0.gguf`	Caption encoder
VAE	`vae-BF16.gguf`	Audio decoder (always BF16)

Companion models are available from Serveurperso/ACE-Step-1.5-GGUF.

Recommended Settings

Since these are non-turbo (is_turbo: false) merged models, they use base-mode scheduling:

Parameter	Recommended
Inference steps	60–100
CFG scale	3.0–7.0
Guidance mode	`apg` or `cfg`
Duration	30–180s

Note: Higher step counts are needed compared to turbo models. These models trade speed for quality and creative range.

How These Were Made

Source checkpoints: Official ACE-Step v1.5 XL safetensors (Base, SFT, Turbo)
Merge method: Task arithmetic — θ_merged = θ_A + λ(θ_B − θ_A)
Conversion: Safetensors → GGUF BF16 via convert.py from acestep.cpp
Quantization: BF16 → Q4_K_M / Q5_K_M / Q6_K / Q8_0 via the acestep.cpp quantize tool

License

These models inherit the license from the upstream ACE-Step v1.5 checkpoints. See the ACE-Step repository for details.

Credits

ACE-Step — Original model architecture and training by the ACE-Step team
acestep.cpp — C++ inference engine and GGUF tooling by Serveurperso
HOT-Step-CPP — Full-stack music generation app by scragnog
Task arithmetic merges — Produced by scragnog

Downloads last month: 486

GGUF

Model size

5B params

Architecture

acestep-dit

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Paper for scragnog/ace-step-1.5-gguf-merge-models

Editing Models with Task Arithmetic

Paper • 2212.04089 • Published Dec 8, 2022 • 8