Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.20619

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Paper • 2512.19526 • Published 4 days ago • 10
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry

Paper • 2512.18314 • Published 7 days ago • 7
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Paper • 2512.17351 • Published 8 days ago • 20

Video generation

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 4 days ago • 8

about 8 hours ago

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published 10 days ago • 114
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published 11 days ago • 40
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published 8 days ago • 184

about 8 hours ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21 • 35
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published Oct 9 • 33
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5 • 23
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14 • 26

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Read But Not Implemented

about 8 hours ago

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published 9 days ago • 64
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published 29 days ago • 213
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published 8 days ago • 184
Sharp Monocular View Synthesis in Less Than a Second

Paper • 2512.10685 • Published 15 days ago • 15

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published 7 days ago • 62
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published 4 days ago • 60
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published 18 days ago • 110

about 2 hours ago

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 25 days ago • 50
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 269

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9 • 14

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Paper • 2512.19526 • Published 4 days ago • 10
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry

Paper • 2512.18314 • Published 7 days ago • 7
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Paper • 2512.17351 • Published 8 days ago • 20

Read But Not Implemented

about 8 hours ago

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published 9 days ago • 64
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published 29 days ago • 213
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published 8 days ago • 184
Sharp Monocular View Synthesis in Less Than a Second

Paper • 2512.10685 • Published 15 days ago • 15

Video generation

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 4 days ago • 8

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published 7 days ago • 62
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published 4 days ago • 60
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published 18 days ago • 110

about 8 hours ago

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published 10 days ago • 114
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published 11 days ago • 40
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published 8 days ago • 184

about 2 hours ago

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 25 days ago • 50
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 269

about 8 hours ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21 • 35
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published Oct 9 • 33
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5 • 23
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14 • 26

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9 • 14

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs