Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
Abstract
Discrete Moment Matching Distillation (D-MMD) enables effective distillation of discrete diffusion models by adapting continuous-domain techniques, achieving superior performance compared to previous methods.
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.
Community
Discrete Moment Matching Distillation preserves quality and diversity when distilling discrete diffusion models, enabling efficient sampling for text and image tasks and sometimes surpassing teacher models.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Latent Shadows: The Gaussian-Discrete Duality in Masked Diffusion (2026)
- IDLM: Inverse-distilled Diffusion Language Models (2026)
- T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization (2026)
- CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think (2026)
- Unifying Masked Diffusion Models with Various Generation Orders and Beyond (2026)
- Sparsely Supervised Diffusion (2026)
- One-step Language Modeling via Continuous Denoising (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper