TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 13 days ago • 84
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published 8 days ago • 61
view article Article Backbone-Optimizer Coupling Bias: The Hidden Co-Design Principle 11 days ago • 3
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 12 days ago • 81
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation Paper • 2512.16913 • Published 12 days ago • 33
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 28 days ago • 241
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory Paper • 2511.22609 • Published Nov 27 • 48
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published 29 days ago • 88
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published Nov 25 • 32
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs Paper • 2510.23127 • Published Oct 27 • 5
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding Paper • 2510.23479 • Published Oct 27 • 14
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents Paper • 2507.22827 • Published Jul 30 • 99
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3 • 58
Taming LLMs by Scaling Learning Rates with Gradient Grouping Paper • 2506.01049 • Published Jun 1 • 38
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 95
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes Paper • 2503.13435 • Published Mar 17 • 18