-
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 85 -
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
Paper • 2512.19526 • Published • 10 -
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry
Paper • 2512.18314 • Published • 7 -
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2512.20619
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 114 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 40 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 85 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 184
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 64 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 213 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 184 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 15
-
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Paper • 2512.17532 • Published • 62 -
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper • 2512.19693 • Published • 60 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 85 -
EgoX: Egocentric Video Generation from a Single Exocentric Video
Paper • 2512.08269 • Published • 110
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 50 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 269
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 85 -
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
Paper • 2512.19526 • Published • 10 -
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry
Paper • 2512.18314 • Published • 7 -
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 20
-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 64 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 213 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 184 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 15
-
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Paper • 2512.17532 • Published • 62 -
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper • 2512.19693 • Published • 60 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 85 -
EgoX: Egocentric Video Generation from a Single Exocentric Video
Paper • 2512.08269 • Published • 110
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 114 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 40 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 85 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 184
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 50 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 269
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13