For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

Feel free to request for other models for compression as well (for either the diffusers library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.

How to Use

diffusers

import torch
from diffusers import ZImagePipeline, ZImageTransformer2DModel
from dfloat11 import DFloat11Model
from transformers.modeling_utils import no_init_weights

pattern_dict = {
    r"noise_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"context_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
    ),
    r"layers\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"cap_embedder": (
        "1",
    )
}

text_encoder = DFloat11Model.from_pretrained("DFloat11/Qwen3-4B-DF11", device="cpu")

with no_init_weights():
    transformer = ZImageTransformer2DModel.from_config(
        ZImageTransformer2DModel.load_config(
            "Tongyi-MAI/Z-Image-Turbo", subfolder="transformer"
        ),
        torch_dtype=torch.bfloat16
    ).to(torch.bfloat16)


DFloat11Model.from_single_file(
    r".\BEYOND REALITY SUPER Z IMAGE 2.0 淡妆浓抹总相宜 BF16-DF11.safetensors", # Make sure to download the file first, and edit the filepath accordingly
    device='cpu', 
    bfloat16_model=transformer, 
    pattern_dict=pattern_dict
)

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    text_encoder=text_encoder,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")


ComfyUI

Refer to this model instead.

Compression details

This is the pattern_dict for compression:

pattern_dict = {
    r"noise_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"context_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
    ),
    r"layers\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"cap_embedder": (
        "1",
    )
}
Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mingyi456/BEYOND_REALITY_Z_IMAGE-DF11

Quantized
(2)
this model