9 17 10

Hao Fei

scofield7419

http://haofei.vip/

AI & ML interests

Multimodal Learning, Large Language Model, Vision and Language, Natural Language Processing, Structural Modeling

Recent Activity

authored a paper 8 days ago

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

authored a paper 8 days ago

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

authored a paper 8 days ago

Semantic Role Labeling: A Systematical Survey

View all activity

Organizations

authored 10 papers 8 days ago

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Paper • 2412.12932 • Published Dec 17, 2024 • 2

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Paper • 2412.10342 • Published Dec 13, 2024

Semantic Role Labeling: A Systematical Survey

Paper • 2502.08660 • Published Feb 9, 2025

Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark

Paper • 2502.04976 • Published Feb 7, 2025

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology

Paper • 2503.14911 • Published Mar 19, 2025 • 3

Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning

Paper • 2602.00971 • Published Feb 28

submitted a paper to Daily Papers 11 days ago

Audio-Visual Intelligence in Large Foundation Models

Paper • 2605.04045 • Published 14 days ago • 33

upvoted a paper 11 days ago

Audio-Visual Intelligence in Large Foundation Models

Paper • 2605.04045 • Published 14 days ago • 33

upvoted a paper 28 days ago

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

Paper • 2604.12617 • Published Apr 14 • 6

authored 7 papers 2 months ago

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Paper • 2406.05127 • Published Jun 7, 2024

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Paper • 2505.18660 • Published May 24, 2025 • 2

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Paper • 2505.24164 • Published May 30, 2025

SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control

Paper • 2505.19463 • Published May 26, 2025

MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation

Paper • 2510.00647 • Published Oct 1, 2025

DragNeXt: Rethinking Drag-Based Image Editing

Paper • 2506.07611 • Published Jun 9, 2025 • 1

A Reason-then-Describe Instruction Interpreter for Controllable Video Generation

Paper • 2511.20563 • Published Nov 25, 2025 • 1

Hao Fei

AI & ML interests

Recent Activity

Organizations

scofield7419's activity