Multimodal Reasoning - a btjhjeon Collection

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 129

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published Feb 22, 2025 • 18

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Paper • 2502.19634 • Published Feb 26, 2025 • 63

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3, 2025 • 86

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published Mar 10, 2025 • 61

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published Mar 9, 2025 • 31

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10, 2025 • 88

Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 42

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7, 2025 • 38

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13, 2025 • 17

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16, 2025 • 35

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17, 2025 • 30

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17, 2025 • 17

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published Mar 17, 2025 • 32

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21, 2025 • 24

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

Paper • 2503.16549 • Published Mar 19, 2025 • 15

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

Paper • 2503.18013 • Published Mar 23, 2025 • 20

Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27, 2025 • 79

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27, 2025 • 62

OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning

Paper • 2503.16081 • Published Mar 20, 2025 • 28

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Paper • 2504.00883 • Published Apr 1, 2025 • 67

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published Apr 3, 2025 • 32

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Paper • 2504.03151 • Published Apr 4, 2025 • 15

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Paper • 2504.05599 • Published Apr 8, 2025 • 85

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published Apr 9, 2025 • 13

OmniCaptioner: One Captioner to Rule Them All

Paper • 2504.07089 • Published Apr 9, 2025 • 20

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 137

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 43

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

Paper • 2504.14239 • Published Apr 19, 2025 • 14

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Paper • 2504.16656 • Published Apr 23, 2025 • 57

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published May 6, 2025 • 92

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8, 2025 • 186

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

Paper • 2505.03981 • Published May 6, 2025 • 15

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11, 2025 • 155

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

Paper • 2505.07263 • Published May 12, 2025 • 30

Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?

Paper • 2505.09439 • Published May 14, 2025 • 10

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

Paper • 2505.08617 • Published May 13, 2025 • 42

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Paper • 2505.11049 • Published May 16, 2025 • 61

Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16, 2025 • 57

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Paper • 2505.13427 • Published May 19, 2025 • 26

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning

Paper • 2505.12081 • Published May 17, 2025 • 18

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20, 2025 • 133

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Paper • 2505.14460 • Published May 20, 2025 • 33

Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

Paper • 2505.14677 • Published May 20, 2025 • 15

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Paper • 2505.14231 • Published May 20, 2025 • 53

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Paper • 2505.17022 • Published May 22, 2025 • 27

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Paper • 2505.17018 • Published May 22, 2025 • 15

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22, 2025 • 11

GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 13

SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information

Paper • 2505.13237 • Published May 19, 2025 • 1

VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22, 2025 • 12

Training-Free Reasoning and Reflection in MLLMs

Paper • 2505.16151 • Published May 22, 2025 • 9

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Paper • 2505.20256 • Published May 26, 2025 • 19

G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning

Paper • 2505.13426 • Published May 19, 2025 • 13

STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs

Paper • 2505.15804 • Published May 21, 2025 • 10

Jodi: Unification of Visual Generation and Understanding via Joint Modeling

Paper • 2505.19084 • Published May 25, 2025 • 20

VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

Paper • 2505.19000 • Published May 25, 2025 • 42

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Paper • 2505.21374 • Published May 27, 2025 • 28

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

Paper • 2505.21457 • Published May 27, 2025 • 16

Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL

Paper • 2505.17952 • Published May 23, 2025 • 20

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

Paper • 2505.16673 • Published May 22, 2025 • 2

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Paper • 2505.22651 • Published May 28, 2025 • 48

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Paper • 2505.22334 • Published May 28, 2025 • 36

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28, 2025 • 29

Thinking with Generated Images

Paper • 2505.22525 • Published May 28, 2025 • 15

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Paper • 2505.23747 • Published May 29, 2025 • 69

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

Paper • 2505.22914 • Published May 28, 2025 • 37

Grounded Reinforcement Learning for Visual Reasoning

Paper • 2505.23678 • Published May 29, 2025 • 2

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Paper • 2506.01713 • Published Jun 2, 2025 • 48

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4, 2025 • 48

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

Paper • 2506.05328 • Published Jun 5, 2025 • 21

Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

Paper • 2506.04559 • Published Jun 5, 2025 • 2

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Paper • 2506.04614 • Published Jun 5, 2025 • 19

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Paper • 2506.09790 • Published Jun 11, 2025 • 53

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9, 2025 • 14

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Paper • 2506.13654 • Published Jun 16, 2025 • 43

VGR: Visual Grounded Reasoning

Paper • 2506.11991 • Published Jun 13, 2025 • 20

Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs

Paper • 2506.16962 • Published Jun 20, 2025 • 10

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

Paper • 2506.16141 • Published Jun 19, 2025 • 27

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

Paper • 2506.21448 • Published Jun 26, 2025 • 8

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 251

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Paper • 2506.21277 • Published Jun 26, 2025 • 14

Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2, 2025 • 131

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30, 2025 • 90

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Paper • 2507.05920 • Published Jul 8, 2025 • 12

Perception-Aware Policy Optimization for Multimodal Reasoning

Paper • 2507.06448 • Published Jul 8, 2025 • 48

Skywork-R1V3 Technical Report

Paper • 2507.06167 • Published Jul 8, 2025 • 73

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7, 2025 • 75

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 79

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22, 2025 • 34

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Paper • 2507.16815 • Published Jul 22, 2025 • 42

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Paper • 2507.16814 • Published Jul 22, 2025 • 21

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Paper • 2507.22607 • Published Jul 30, 2025 • 47

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18, 2025 • 50

MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11, 2025 • 44

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Paper • 2508.10433 • Published Aug 14, 2025 • 144

HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs

Paper • 2508.10576 • Published Aug 14, 2025 • 8

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28, 2025 • 110

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31, 2025 • 85

Planning with Reasoning using Vision Language World Model

Paper • 2509.02722 • Published Sep 2, 2025 • 24

Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning

Paper • 2509.06461 • Published Sep 8, 2025 • 20

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models

Paper • 2509.12132 • Published Sep 15, 2025 • 7

Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge

Paper • 2509.06079 • Published Sep 7, 2025 • 6

BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19, 2025 • 21

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19, 2025 • 14

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

Paper • 2509.14142 • Published Sep 17, 2025 • 10

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25, 2025 • 104

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Paper • 2509.25541 • Published Sep 29, 2025 • 140

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Paper • 2509.25848 • Published Sep 30, 2025 • 80

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2, 2025 • 12

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 51

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9, 2025 • 81

TTRV: Test-Time Reinforcement Learning for Vision Language Models

Paper • 2510.06783 • Published Oct 8, 2025 • 12

Generative Universal Verifier as Multimodal Meta-Reasoner

Paper • 2510.13804 • Published Oct 15, 2025 • 27

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 56

Directional Reasoning Injection for Fine-Tuning MLLMs

Paper • 2510.15050 • Published Oct 16, 2025 • 12

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Paper • 2510.23473 • Published Oct 27, 2025 • 85

SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMs

Paper • 2510.25092 • Published Oct 29, 2025 • 8

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

Paper • 2510.23451 • Published Oct 27, 2025 • 28

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 59

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 240

V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 97

MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published Nov 10, 2025 • 13

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Paper • 2511.13026 • Published Nov 17, 2025 • 26

VisPlay: Self-Evolving Vision-Language Models from Images

Paper • 2511.15661 • Published Nov 19, 2025 • 43

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

Paper • 2511.16671 • Published Nov 20, 2025 • 16

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published Nov 23, 2025 • 7

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 93

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Paper • 2511.15705 • Published Nov 19, 2025 • 97

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

Paper • 2511.20814 • Published Nov 25, 2025 • 2

Think Visually, Reason Textually: Vision-Language Synergy in ARC

Paper • 2511.15703 • Published Nov 19, 2025 • 9

MIRA: Multimodal Iterative Reasoning Agent for Image Editing

Paper • 2511.21087 • Published Nov 26, 2025 • 10

REASONEDIT: Towards Reasoning-Enhanced Image Editing Models

Paper • 2511.22625 • Published Nov 27, 2025 • 47

Geometrically-Constrained Agent for Spatial Reasoning

Paper • 2511.22659 • Published Nov 27, 2025 • 41

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

Paper • 2511.22134 • Published Nov 27, 2025 • 22

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Paper • 2512.02395 • Published Dec 2, 2025 • 49

Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization

Paper • 2511.22586 • Published Nov 27, 2025 • 7

Artemis: Structured Visual Reasoning for Perception Policy Learning

Paper • 2512.01988 • Published Dec 1, 2025 • 2

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

Paper • 2511.19661 • Published Nov 24, 2025 • 2

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published Dec 2, 2025 • 33

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

Paper • 2512.03746 • Published Dec 3, 2025 • 17

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Paper • 2512.05111 • Published Dec 4, 2025 • 50

Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning

Paper • 2512.03667 • Published Dec 3, 2025 • 5

Rethinking Chain-of-Thought Reasoning for Videos

Paper • 2512.09616 • Published Dec 10, 2025 • 19

VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning

Paper • 2512.06373 • Published Dec 6, 2025 • 9

Thinking with Images via Self-Calling Agent

Paper • 2512.08511 • Published Dec 9, 2025 • 23

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published Dec 19, 2025 • 67

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

Paper • 2512.12623 • Published Dec 14, 2025 • 4

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119

Latent Implicit Visual Reasoning

Paper • 2512.21218 • Published Dec 24, 2025 • 69

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Paper • 2512.22120 • Published Dec 26, 2025 • 15

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Paper • 2512.18745 • Published Dec 21, 2025 • 12

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Paper • 2601.05175 • Published Jan 8 • 36

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

Paper • 2601.06803 • Published Jan 11 • 10

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

Paper • 2601.09536 • Published Jan 14 • 5

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Paper • 2601.10477 • Published Jan 15 • 155

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published Jan 15 • 12

FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

Paper • 2601.13976 • Published Jan 20 • 21

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

Paper • 2601.14750 • Published Jan 21 • 17

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

Paper • 2601.15224 • Published Jan 21 • 12

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Paper • 2601.21821 • Published 26 days ago • 59

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

Paper • 2601.22069 • Published 26 days ago • 7

Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling

Paper • 2602.02453 • Published 22 days ago • 36

Training Data Efficiency in Multimodal Process Reward Models

Paper • 2602.04145 • Published 21 days ago • 76

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Paper • 2602.06040 • Published 19 days ago • 10