MiniSora: Fully Open Video Diffusion with ColossalAI

GitHub: YN35/minisora
Author (X / Twitter): @ramu0e


🧾 Overview

MiniSora is a fully open video diffusion codebase designed for everything from research to production.

  • All training, inference, and evaluation scripts are available
  • Supports multi-GPU / multi-node training via ColossalAI
  • Simple DiT-based video model + pipeline, inspired by Diffusers
  • Includes a continuation demo to generate the "next" part of an existing video

This model card hosts the DiT pipeline trained on DMLab trajectories and published as ramu0e/minisora-dmlab.


πŸš€ Inference: Text-to-Video (Minimal Example)

from minisora.models import DiTPipeline

pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab")

output = pipeline(
    batch_size=1,
    num_inference_steps=28,
    height=64,
    width=64,
    num_frames=20,
)
latents = output.latents  # shape: (B, C, F, H, W)

latents are video tensors that remain in the same normalized space as training.
Use the scripts in the repository to decode or visualize them.


πŸŽ₯ Continuation: Generate the Rest of a Video

MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix.
A demo script is bundled to extend existing videos.

uv run scripts/demo/full_continuation.py \
  --model-id ramu0e/minisora-dmlab \
  --input-video path/to/input.mp4 \
  --num-extend-frames 12 \
  --num-inference-steps 28 \
  --seed 1234

See scripts/demo/full_continuation.py for the exact arguments and I/O specification.


🧩 Key Features

  • End-to-End Transparency

    • Model definition (DiT): src/minisora/models/modeling_dit.py
    • Pipeline: src/minisora/models/pipeline_dit.py
    • Training script: scripts/train.py
    • Data loaders: src/minisora/data/
      Every stage from data to inference is available.
  • ColossalAI for Scale-Out Training

    • Zero / DDP plugins
    • Designed for multi-GPU and multi-node setups
    • Easy experimentation with large video models
  • Simple, Readable Implementation

    • Dependency management via uv (uv sync / uv run)
    • Minimal Diffusers-inspired video DiT pipeline
    • Experiments and analysis scripts organized under reports/
  • Continuation / Conditioning Ready

    • Masking logic to fix conditioned frames
    • Training scheme that applies noise to only part of the sequence

πŸ›  Install & Setup

1. Clone the Repository

git clone https://github.com/YN35/minisora.git
cd minisora

2. Install Dependencies with uv

uv sync

All scripts can then be executed through uv run ....


πŸ“¦ This Pipeline (ramu0e/minisora-dmlab)

This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories.

  • Model type: DiT-based video diffusion model
  • Training resolution: e.g., 64Γ—64 or 128Γ—128 (see reports/ in the repo)
  • Frames per sample: typically 20
  • Library: minisora (custom lightweight framework)
  • Use case: research or sample-quality video generation

πŸ§ͺ Training (Summary)

Complete training code is available in the repository.

  • Main script: scripts/train.py
  • Highlights:
    • Rectified-flow style training with FlowMatchEulerDiscreteScheduler
    • ColossalAI Booster to switch between Zero / DDP
    • Conditioning-aware objective (noise partial subsets of frames)

Example: Single-Node Training

uv run scripts/train.py \
  --dataset_type minecraft \
  --data_root /path/to/train_data \
  --outputs outputs/exp1 \
  --batch_size 32 \
  --precision bf16

Example: Multi-Node (torchrun + ColossalAI)

torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \
  --dataset_type minecraft \
  --data_root /path/to/train_data \
  --outputs outputs/exp-multinode \
  --batch_size 64 \
  --plugin zero --zero 1

Refer to scripts/train.py for all available options.


πŸ“š Repository Structure (Excerpt)

  • src/minisora/models/modeling_dit.py – core DiT transformer for video
  • src/minisora/models/pipeline_dit.py – Diffusers-style pipeline (DiTPipeline)
  • src/minisora/data/ – datasets and distributed samplers (DMLab, Minecraft)
  • scripts/train.py – ColossalAI-based training loop
  • scripts/demo/full_vgen.py – simple end-to-end video generation demo
  • scripts/demo/full_continuation.py – continuation demo
  • reports/ – experiment notes, mask visualizations, metric scripts

πŸ” Limitations & Notes

  • This checkpoint targets research-scale experiments.
  • Quality at higher resolution or longer durations depends on data and hyperparameters.
  • Continuation quality varies with the provided prefix and conditioning setup.

🀝 Contributions

  • Contributions to code, models, and docs are welcome.
  • Please open issues or PRs at YN35/minisora.

πŸ“„ License

  • Code and weights are released under the MIT License.
    Commercial use, modification, and redistribution are all permitted (see the GitHub LICENSE).
MIT License
Copyright (c) YN
Permission is hereby granted, free of charge, to any person obtaining a copy
...
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support