--- license: apache-2.0 tags: - text-to-video - diffusion - memory - wan2.1 base_model: Wan-Video/Wan2.1-T2V-1.3B datasets: - Thrcle/DiT-Mem-Data --- # DiT-Mem-1.3B This repository contains the official training weights for the paper **"Learning Plug-and-play Memory for Guiding Video Diffusion Models"**. ## 📦 Model Details - **Model Name**: DiT-Mem-1.3B - **Base Model**: [Wan2.1-T2V-1.3B](https://github.com/Wan-Video/Wan2.1) - **Description**: DiT-Mem is a lightweight, plug-and-play memory module (~150M parameters) designed to inject world knowledge into existing video diffusion models. It improves physical consistency and generation quality without retraining the large backbone model. ## 🔗 Related Resources - **GitHub Repository**: [DiT-Mem](https://github.com/Thrcle421/DiT-Mem) - **Dataset**: [DiT-Mem-Data](https://huggingface.co/datasets/Thrcle/DiT-Mem-Data) - **Paper**: [Learning Plug-and-play Memory for Guiding Video Diffusion Models](https://arxiv.org/pdf/2511.19229) ## 🚀 Usage To use this model: 1. **Download Weights**: Download `DiT-Mem-1.3B.safetensors` from this repository. 2. **Setup**: Place the file in the `checkpoint/` directory of the DiT-Mem codebase. 3. **Run Inference**: Refer to the [GitHub README](https://github.com/Thrcle421/DiT-Mem) for inference instructions. ## 📚 Citation ```bibtex @article{song2025learning, title={Learning Plug-and-play Memory for Guiding Video Diffusion Models}, author={Song, Selena and Xu, Ziming and Zhang, Zijun and Zhou, Kun and Guo, Jiaxian and Qin, Lianhui and Huang, Biwei}, journal={arXiv preprint arXiv:2511.19229}, year={2025} } ```