MAOAM: Unified Object and Material Selection with Vision-Language Models Paper • 2606.04880 • Published 7 days ago • 9
SMART Collection Your Single-Vector Embedding Model is SMARTer Than You Think • 5 items • Updated 14 days ago • 2
From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing Paper • 2605.15181 • Published 26 days ago • 12
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published Apr 14 • 25
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published Mar 18 • 14
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation Paper • 2511.03774 • Published Nov 5, 2025 • 13
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos Paper • 2410.02763 • Published Oct 3, 2024 • 7