File size: 1,636 Bytes
0e50b73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbba748
0e50b73
 
 
 
 
 
 
 
 
 
9861933
 
 
 
 
0e50b73
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: apache-2.0
tags:
- text-to-video
- diffusion
- memory
- wan2.1
base_model: Wan-Video/Wan2.1-T2V-1.3B
datasets:
- Thrcle/DiT-Mem-Data
---

# DiT-Mem-1.3B

This repository contains the official training weights for the paper **"Learning Plug-and-play Memory for Guiding Video Diffusion Models"**.

## ๐Ÿ“ฆ Model Details
- **Model Name**: DiT-Mem-1.3B
- **Base Model**: [Wan2.1-T2V-1.3B](https://github.com/Wan-Video/Wan2.1)
- **Description**: DiT-Mem is a lightweight, plug-and-play memory module (~150M parameters) designed to inject world knowledge into existing video diffusion models. It improves physical consistency and generation quality without retraining the large backbone model.

## ๐Ÿ”— Related Resources
- **GitHub Repository**: [DiT-Mem](https://github.com/Thrcle421/DiT-Mem)
- **Dataset**: [DiT-Mem-Data](https://huggingface.co/datasets/Thrcle/DiT-Mem-Data)
- **Paper**: [Learning Plug-and-play Memory for Guiding Video Diffusion Models](https://arxiv.org/pdf/2511.19229)

## ๐Ÿš€ Usage
To use this model:

1. **Download Weights**: Download `DiT-Mem-1.3B.safetensors` from this repository.
2. **Setup**: Place the file in the `checkpoint/` directory of the DiT-Mem codebase.
3. **Run Inference**: Refer to the [GitHub README](https://github.com/Thrcle421/DiT-Mem) for inference instructions.

## ๐Ÿ“š Citation
```bibtex
@article{song2025learning,
  title={Learning Plug-and-play Memory for Guiding Video Diffusion Models},
  author={Song, Selena and Xu, Ziming and Zhang, Zijun and Zhou, Kun and Guo, Jiaxian and Qin, Lianhui and Huang, Biwei},
  journal={arXiv preprint arXiv:2511.19229},
  year={2025}
}
```