File size: 1,636 Bytes

---
license: apache-2.0
tags:
- text-to-video
- diffusion
- memory
- wan2.1
base_model: Wan-Video/Wan2.1-T2V-1.3B
datasets:
- Thrcle/DiT-Mem-Data
---

# DiT-Mem-1.3B

This repository contains the official training weights for the paper **"Learning Plug-and-play Memory for Guiding Video Diffusion Models"**.

## 📦 Model Details
- **Model Name**: DiT-Mem-1.3B
- **Base Model**: [Wan2.1-T2V-1.3B](https://github.com/Wan-Video/Wan2.1)
- **Description**: DiT-Mem is a lightweight, plug-and-play memory module (~150M parameters) designed to inject world knowledge into existing video diffusion models. It improves physical consistency and generation quality without retraining the large backbone model.

## 🔗 Related Resources
- **GitHub Repository**: [DiT-Mem](https://github.com/Thrcle421/DiT-Mem)
- **Dataset**: [DiT-Mem-Data](https://huggingface.co/datasets/Thrcle/DiT-Mem-Data)
- **Paper**: [Learning Plug-and-play Memory for Guiding Video Diffusion Models](https://arxiv.org/pdf/2511.19229)

## 🚀 Usage
To use this model:

1. **Download Weights**: Download `DiT-Mem-1.3B.safetensors` from this repository.
2. **Setup**: Place the file in the `checkpoint/` directory of the DiT-Mem codebase.
3. **Run Inference**: Refer to the [GitHub README](https://github.com/Thrcle421/DiT-Mem) for inference instructions.

## 📚 Citation
```bibtex
@article{song2025learning,
  title={Learning Plug-and-play Memory for Guiding Video Diffusion Models},
  author={Song, Selena and Xu, Ziming and Zhang, Zijun and Zhou, Kun and Guo, Jiaxian and Qin, Lianhui and Huang, Biwei},
  journal={arXiv preprint arXiv:2511.19229},
  year={2025}
}
```