File size: 1,636 Bytes
0e50b73 cbba748 0e50b73 9861933 0e50b73 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
license: apache-2.0
tags:
- text-to-video
- diffusion
- memory
- wan2.1
base_model: Wan-Video/Wan2.1-T2V-1.3B
datasets:
- Thrcle/DiT-Mem-Data
---
# DiT-Mem-1.3B
This repository contains the official training weights for the paper **"Learning Plug-and-play Memory for Guiding Video Diffusion Models"**.
## ๐ฆ Model Details
- **Model Name**: DiT-Mem-1.3B
- **Base Model**: [Wan2.1-T2V-1.3B](https://github.com/Wan-Video/Wan2.1)
- **Description**: DiT-Mem is a lightweight, plug-and-play memory module (~150M parameters) designed to inject world knowledge into existing video diffusion models. It improves physical consistency and generation quality without retraining the large backbone model.
## ๐ Related Resources
- **GitHub Repository**: [DiT-Mem](https://github.com/Thrcle421/DiT-Mem)
- **Dataset**: [DiT-Mem-Data](https://huggingface.co/datasets/Thrcle/DiT-Mem-Data)
- **Paper**: [Learning Plug-and-play Memory for Guiding Video Diffusion Models](https://arxiv.org/pdf/2511.19229)
## ๐ Usage
To use this model:
1. **Download Weights**: Download `DiT-Mem-1.3B.safetensors` from this repository.
2. **Setup**: Place the file in the `checkpoint/` directory of the DiT-Mem codebase.
3. **Run Inference**: Refer to the [GitHub README](https://github.com/Thrcle421/DiT-Mem) for inference instructions.
## ๐ Citation
```bibtex
@article{song2025learning,
title={Learning Plug-and-play Memory for Guiding Video Diffusion Models},
author={Song, Selena and Xu, Ziming and Zhang, Zijun and Zhou, Kun and Guo, Jiaxian and Qin, Lianhui and Huang, Biwei},
journal={arXiv preprint arXiv:2511.19229},
year={2025}
}
```
|