DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception Paper • 2505.04410 • Published May 7, 2025 • 44
Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception Paper • 2508.11256 • Published Aug 15, 2025
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging Paper • 2602.08024 • Published Feb 8 • 2
Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior Paper • 2512.06866 • Published Dec 7, 2025 • 5
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision Paper • 2405.17913 • Published May 28, 2024
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging Paper • 2602.08024 • Published Feb 8 • 2
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging Paper • 2602.08024 • Published Feb 8 • 2
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 53
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents Paper • 2505.23923 • Published May 29, 2025 • 8
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction Paper • 2505.20277 • Published May 26, 2025
Improving Transformer World Models for Data-Efficient RL Paper • 2502.01591 • Published Feb 3, 2025 • 10
OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis Paper • 2501.04561 • Published Jan 8, 2025 • 16
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9, 2024 • 49
Text-Video Retrieval with Global-Local Semantic Consistent Learning Paper • 2405.12710 • Published May 21, 2024