DiT-Mem-1.3B / README.md
Thrcle's picture
Update citation to @article format
9861933 verified
metadata
license: apache-2.0
tags:
  - text-to-video
  - diffusion
  - memory
  - wan2.1
base_model: Wan-Video/Wan2.1-T2V-1.3B
datasets:
  - Thrcle/DiT-Mem-Data

DiT-Mem-1.3B

This repository contains the official training weights for the paper "Learning Plug-and-play Memory for Guiding Video Diffusion Models".

πŸ“¦ Model Details

  • Model Name: DiT-Mem-1.3B
  • Base Model: Wan2.1-T2V-1.3B
  • Description: DiT-Mem is a lightweight, plug-and-play memory module (~150M parameters) designed to inject world knowledge into existing video diffusion models. It improves physical consistency and generation quality without retraining the large backbone model.

πŸ”— Related Resources

πŸš€ Usage

To use this model:

  1. Download Weights: Download DiT-Mem-1.3B.safetensors from this repository.
  2. Setup: Place the file in the checkpoint/ directory of the DiT-Mem codebase.
  3. Run Inference: Refer to the GitHub README for inference instructions.

πŸ“š Citation

@article{song2025learning,
  title={Learning Plug-and-play Memory for Guiding Video Diffusion Models},
  author={Song, Selena and Xu, Ziming and Zhang, Zijun and Zhou, Kun and Guo, Jiaxian and Qin, Lianhui and Huang, Biwei},
  journal={arXiv preprint arXiv:2511.19229},
  year={2025}
}