Phenaki: Variable Length Video Generation From Open Domain Textual Description
Paper • 2210.02399 • Published • 3
MaskGiT is trained to reconstruct masked tokens z predicted by a frozen C-ViViT encoder and conditioned on T5X tokens of a given prompt p0