--- library_name: minisora license: mit language: - en tags: - text-to-video - video-diffusion - continuation - colossalai pipeline_tag: text-to-video --- # MiniSora: Fully Open Video Diffusion with ColossalAI [GitHub: YN35/minisora](https://github.com/YN35/minisora) [Author (X / Twitter): @__ramu0e__](https://x.com/__ramu0e__) --- ## ๐Ÿงพ Overview **MiniSora** is a fully open video diffusion codebase designed for everything from research to production. - All training, inference, and evaluation scripts are available - Supports multi-GPU / multi-node training via **ColossalAI** - Simple DiT-based video model + pipeline, inspired by Diffusers - Includes a continuation demo to generate the "next" part of an existing video This model card hosts the DiT pipeline trained on DMLab trajectories and published as `ramu0e/minisora-dmlab`. --- ## ๐Ÿš€ Inference: Text-to-Video (Minimal Example) ```python from minisora.models import DiTPipeline pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab") output = pipeline( batch_size=1, num_inference_steps=28, height=64, width=64, num_frames=20, ) latents = output.latents # shape: (B, C, F, H, W) ``` `latents` are video tensors that remain in the same normalized space as training. Use the scripts in the repository to decode or visualize them. --- ## ๐ŸŽฅ Continuation: Generate the Rest of a Video MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix. A demo script is bundled to extend existing videos. ```bash uv run scripts/demo/full_continuation.py \ --model-id ramu0e/minisora-dmlab \ --input-video path/to/input.mp4 \ --num-extend-frames 12 \ --num-inference-steps 28 \ --seed 1234 ``` See `scripts/demo/full_continuation.py` for the exact arguments and I/O specification. --- ## ๐Ÿงฉ Key Features - **End-to-End Transparency** - Model definition (DiT): `src/minisora/models/modeling_dit.py` - Pipeline: `src/minisora/models/pipeline_dit.py` - Training script: `scripts/train.py` - Data loaders: `src/minisora/data/` Every stage from data to inference is available. - **ColossalAI for Scale-Out Training** - Zero / DDP plugins - Designed for multi-GPU and multi-node setups - Easy experimentation with large video models - **Simple, Readable Implementation** - Dependency management via `uv` (`uv sync` / `uv run`) - Minimal Diffusers-inspired video DiT pipeline - Experiments and analysis scripts organized under `reports/` - **Continuation / Conditioning Ready** - Masking logic to fix conditioned frames - Training scheme that applies noise to only part of the sequence --- ## ๐Ÿ›  Install & Setup ### 1. Clone the Repository ```bash git clone https://github.com/YN35/minisora.git cd minisora ``` ### 2. Install Dependencies with `uv` ```bash uv sync ``` All scripts can then be executed through `uv run ...`. --- ## ๐Ÿ“ฆ This Pipeline (`ramu0e/minisora-dmlab`) This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories. - **Model type**: DiT-based video diffusion model - **Training resolution**: e.g., 64ร—64 or 128ร—128 (see `reports/` in the repo) - **Frames per sample**: typically 20 - **Library**: `minisora` (custom lightweight framework) - **Use case**: research or sample-quality video generation --- ## ๐Ÿงช Training (Summary) Complete training code is available in the repository. - Main script: `scripts/train.py` - Highlights: - Rectified-flow style training with `FlowMatchEulerDiscreteScheduler` - ColossalAI Booster to switch between Zero / DDP - Conditioning-aware objective (noise partial subsets of frames) ### Example: Single-Node Training ```bash uv run scripts/train.py \ --dataset_type minecraft \ --data_root /path/to/train_data \ --outputs outputs/exp1 \ --batch_size 32 \ --precision bf16 ``` ### Example: Multi-Node (torchrun + ColossalAI) ```bash torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \ --dataset_type minecraft \ --data_root /path/to/train_data \ --outputs outputs/exp-multinode \ --batch_size 64 \ --plugin zero --zero 1 ``` Refer to `scripts/train.py` for all available options. --- ## ๐Ÿ“š Repository Structure (Excerpt) - `src/minisora/models/modeling_dit.py` โ€“ core DiT transformer for video - `src/minisora/models/pipeline_dit.py` โ€“ Diffusers-style pipeline (`DiTPipeline`) - `src/minisora/data/` โ€“ datasets and distributed samplers (DMLab, Minecraft) - `scripts/train.py` โ€“ ColossalAI-based training loop - `scripts/demo/full_vgen.py` โ€“ simple end-to-end video generation demo - `scripts/demo/full_continuation.py` โ€“ continuation demo - `reports/` โ€“ experiment notes, mask visualizations, metric scripts --- ## ๐Ÿ” Limitations & Notes - This checkpoint targets research-scale experiments. - Quality at higher resolution or longer durations depends on data and hyperparameters. - Continuation quality varies with the provided prefix and conditioning setup. --- ## ๐Ÿค Contributions - Contributions to code, models, and docs are welcome. - Please open issues or PRs at [YN35/minisora](https://github.com/YN35/minisora). --- ## ๐Ÿ“„ License - Code and weights are released under the **MIT License**. Commercial use, modification, and redistribution are all permitted (see the GitHub `LICENSE`). ```text MIT License Copyright (c) YN Permission is hereby granted, free of charge, to any person obtaining a copy ... ```