My favourites - a Warvito Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Warvito 's Collections

My favourites

updated 15 days ago

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 154
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9
Improving the Diffusability of Autoencoders

Paper • 2502.14831 • Published Feb 20, 2025 • 2
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Paper • 2410.10733 • Published Oct 14, 2024 • 9
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

Paper • 2508.00413 • Published Aug 1, 2025 • 5
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14, 2025 • 22
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20, 2025 • 47
MetaCLIP 2: A Worldwide Scaling Recipe

Paper • 2507.22062 • Published Jul 29, 2025 • 37
Waver: Wave Your Way to Lifelike Video Generation

Paper • 2508.15761 • Published Aug 21, 2025 • 38
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12, 2025 • 31
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 199
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 130
Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published Apr 24, 2025 • 92
Transition Matching: Scalable and Flexible Generative Modeling

Paper • 2506.23589 • Published Jun 30, 2025 • 1
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 98
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30, 2025 • 90
Diffusion Beats Autoregressive in Data-Constrained Settings

Paper • 2507.15857 • Published Jul 21, 2025 • 1
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 50
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published Sep 8, 2025 • 29
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

Paper • 2509.14055 • Published Sep 17, 2025 • 17
Inpainting-Guided Policy Optimization for Diffusion Large Language Models

Paper • 2509.10396 • Published Sep 12, 2025 • 16
Lynx: Towards High-Fidelity Personalized Video Generation

Paper • 2509.15496 • Published Sep 19, 2025 • 13
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Paper • 2509.19296 • Published Sep 23, 2025 • 28
Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 100
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23, 2025 • 23
Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24, 2025 • 84
Stochastic activations

Paper • 2509.22358 • Published Sep 26, 2025 • 2
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Paper • 2509.24900 • Published Sep 29, 2025 • 53
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 170
WithAnyone: Towards Controllable and ID Consistent Image Generation

Paper • 2510.14975 • Published Oct 16, 2025 • 86
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Paper • 2510.20766 • Published Oct 23, 2025 • 37
Continuous Autoregressive Language Models

Paper • 2510.27688 • Published Oct 31, 2025 • 74
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12, 2025 • 71
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 177
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
Vision Bridge Transformer at Scale

Paper • 2511.23199 • Published Nov 28, 2025 • 46
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Paper • 2512.07829 • Published Dec 8, 2025 • 24
Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Paper • 2512.12967 • Published Dec 15, 2025 • 111
What matters for Representation Alignment: Global Information or Spatial Structure?

Paper • 2512.10794 • Published Dec 11, 2025 • 9
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 44
Back to Basics: Let Denoising Generative Models Denoise

Paper • 2511.13720 • Published Nov 17, 2025 • 70
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published Feb 2 • 46
BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Paper • 2602.14041 • Published Feb 15 • 53
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Paper • 2602.12205 • Published Feb 12 • 81
Autoregressive Image Generation with Masked Bit Modeling

Paper • 2602.09024 • Published Feb 9 • 7
ViT-5: Vision Transformers for The Mid-2020s

Paper • 2602.08071 • Published Feb 8 • 1
MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

Paper • 2602.12705 • Published Feb 13 • 68
Unified Latents (UL): How to train your latents

Paper • 2602.17270 • Published Feb 19 • 60
dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Paper • 2603.02175 • Published Mar 2 • 24
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5, 2025 • 82
Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 186
LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 176
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published Mar 10 • 48
MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Paper • 2508.05772 • Published Aug 7, 2025 • 3
Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 204
Mixture-of-Depths Attention

Paper • 2603.15619 • Published Mar 16 • 80
Repurposing Geometric Foundation Models for Multi-view Diffusion

Paper • 2603.22275 • Published 26 days ago • 47
PixelSmile: Toward Fine-Grained Facial Expression Editing

Paper • 2603.25728 • Published 23 days ago • 117
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published Mar 17 • 138
MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

Paper • 2603.06679 • Published 20 days ago • 5

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs