VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization Paper • 2606.02564 • Published Jun 1 • 29
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning Paper • 2605.27310 • Published May 26 • 20
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools Paper • 2605.20682 • Published May 20 • 85
WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild Paper • 2605.01018 • Published May 1 • 9
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Paper • 2605.12684 • Published May 12 • 11