| tags: | |
| - text-to-image | |
| - stable-diffusion | |
| - audio-to-video | |
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: diffusers | |
| # V-Express Model Card | |
| <div align="center"> | |
| [**Project Page**](https://tenvence.github.io/p/v-express/) **|** [**Paper (comming soon)**](https://tenvence.github.io/p/v-express/) **|** [**Code**](https://github.com/tencent-ailab/V-Express) | |
| </div> | |
| --- | |
| ## Introduction | |
| ## Models | |
| ### Audio Encoder | |
| - [model_ckpts/wav2vec2-base-960h](https://huggingface.co/tk93/V-Express/tree/main/model_ckpts/wav2vec2-base-960h). (It is also available from the original model card [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)) | |
| ### Face Analysis | |
| - [model_ckpts/insightface_models/models/buffalo_l](https://huggingface.co/tk93/V-Express/tree/main/model_ckpts/insightface_models/models/buffalo_l). (It is also available from the original repository [insightface/buffalo_l](https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip)) | |
| ### V-Express | |
| - [model_ckpts/sd-vae-ft-mse](https://huggingface.co/tk93/V-Express/tree/main/model_ckpts/sd-vae-ft-mse). VAE encoder. (original model card [stabilityai/sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)) | |
| - [model_ckpts/stable-diffusion-v1-5](https://huggingface.co/tk93/V-Express/tree/main/model_ckpts/stable-diffusion-v1-5). Only the model configuration file for unet is needed here. (original model card [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)) | |
| - [model_ckpts/v-express](https://huggingface.co/tk93/V-Express/tree/main/model_ckpts/v-express). The video generation model conditional on audio and V-kps we call V-Express. | |