EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation Paper • 2511.11002 • Published 23 days ago • 3
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28 • 118
CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching Paper • 2509.19300 • Published Sep 23 • 6
Running Featured 264 Meigen MultiTalk 🎙 264 Audio-Driven Multi-Person Conversational Video Generation
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe Paper • 2508.01691 • Published Aug 3 • 9
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11 • 20