Zixian Ma's picture

Zixian Ma

zixianma

·

AI & ML interests

Human-AI interaction and collaboration

Recent Activity

authored a paper 8 days ago

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

authored a paper 8 days ago

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

authored a paper 8 days ago

Task Me Anything

View all activity

Organizations

authored 12 papers 8 days ago

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Paper • 2306.14610 • Published Jun 26, 2023 • 2

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

Paper • 2403.11085 • Published Mar 17, 2024

Task Me Anything

Paper • 2406.11775 • Published Jun 17, 2024 • 9

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Paper • 2412.05479 • Published Dec 7, 2024

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Paper • 2412.07012 • Published Dec 9, 2024 • 1

Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations

Paper • 2506.04633 • Published Jun 5, 2025 • 21

Synthetic Visual Genome

Paper • 2506.07643 • Published Jun 9, 2025

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Paper • 2508.17298 • Published Aug 24, 2025 • 4

Reinforced Visual Perception with Tools

Paper • 2509.01656 • Published Sep 1, 2025 • 32

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Paper • 2512.13874 • Published Dec 15, 2025 • 17

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published Jan 15 • 32

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Paper • 2603.24575 • Published 13 days ago • 18

submitted a paper to Daily Papers 11 days ago

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Paper • 2603.24575 • Published 13 days ago • 18

authored a paper over 1 year ago

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Paper • 2410.14669 • Published Oct 18, 2024 • 39