Generate descriptions from images and text prompts
Find answers by describing images
A Foundation Action Model For Generalist GUI Agents