StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10, 2025 • 51
The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure Paper • 2506.22724 • Published Jun 28, 2025 • 10
Certified Mitigation of Worst-Case LLM Copyright Infringement Paper • 2504.16046 • Published Apr 22, 2025 • 13
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 205
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper • 2504.05541 • Published Apr 7, 2025 • 15
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10, 2025 • 30
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated 4 days ago • 78
MegaWika: Millions of reports and their sources across 50 diverse languages Paper • 2307.07049 • Published Jul 13, 2023
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval Paper • 2410.11619 • Published Oct 15, 2024 • 1
MultiVENT and MAGMAR Resources Collection Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated Apr 4, 2025 • 1
MultiVENT and MAGMAR Resources Collection Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated Apr 4, 2025 • 1
MultiVENT and MAGMAR Resources Collection Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated Apr 4, 2025 • 1