Collections
Discover the best community collections!
Collections including paper arxiv:2406.10819
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 72 -
Tree Search for Language Model Agents
Paper • 2407.01476 • Published • 1 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 25
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 29 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65 -
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 25 -
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Paper • 2406.10819 • Published • 2
-
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 211 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 83 -
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 90 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 78
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 29 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65 -
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 25 -
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Paper • 2406.10819 • Published • 2
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 72 -
Tree Search for Language Model Agents
Paper • 2407.01476 • Published • 1 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 25
-
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 211 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 83 -
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 90 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 78
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1