Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.19535

CASA: Cross-Attention as Self-Attention for Efficient Vision-Language Fusion on long context streaming inputs

Running

1

CASA Gallery

🏠

1

Video Gallery for CASA: Cross-Attention via Self-Attention
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10
kyutai/CASA-Helium1-VL-2B

Image-Text-to-Text • 3B • Updated 5 days ago • 146 • 5
kyutai/CASA-Qwen2_5-VL-3B

Image-Text-to-Text • 4B • Updated 5 days ago • 152 • 1

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 26 days ago • 50
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 269

Vision Language Action models

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 38
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22 • 35
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 44
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27 • 31

Video generation

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 5 days ago • 87
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10

Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Paper • 2511.21678 • Published Nov 26 • 11
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Paper • 2512.19134 • Published 6 days ago • 31
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published 10 days ago • 105
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10

CASA: Cross-Attention as Self-Attention for Efficient Vision-Language Fusion on long context streaming inputs

Running

1

CASA Gallery

🏠

1

Video Gallery for CASA: Cross-Attention via Self-Attention
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10
kyutai/CASA-Helium1-VL-2B

Image-Text-to-Text • 3B • Updated 5 days ago • 146 • 5
kyutai/CASA-Qwen2_5-VL-3B

Image-Text-to-Text • 4B • Updated 5 days ago • 152 • 1

Vision Language Action models

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 38
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22 • 35
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 44
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27 • 31

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10

Video generation

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 5 days ago • 87
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 26 days ago • 50
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 269

Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Paper • 2511.21678 • Published Nov 26 • 11
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Paper • 2512.19134 • Published 6 days ago • 31
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published 10 days ago • 105
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 6 days ago • 10

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs