Joy Caption Pre Alpha
Generate captions for images
Generate captions for images
Segment and caption objects in images and videos
Generate descriptions by uploading images or videos
Generate insights from charts using text prompts
Ask questions about images and get answers
Detect objects in your images instantly
Extract text and metadata from PDF files
Try PaliGemma on document understanding tasks
Chat with an AI about your uploaded images
Interact with a chatbot that understands text and images
Chat with images using Llama Vision
GPT 4o like bot.
Extract text from documents using images or PDFs
Generate detailed descriptions from images and videos
Generate document search queries from a page image
Microsoft Phi-3 Vision 128k with Multimodal capabilities
A Fully Open Multilingual Multimodal LLM for 39 Languages
Demo for DocLayout-YOLO
A data extraction tool to convert PDF to Markdown and JSON
Extract text from images
Huggingface space for JanusFlow-1.3B
Generate clickable coordinates on a screenshot
PaliGemma2 LoRA finetuned on VQAv2
Gaze detection using Moondream
Detect and visualize human poses in images and videos
nanonets ocr2 / olmocr / qwen2vl ocr / aya vision / rolmocr
Extract and recognize text from documents and images
OmniParser, turn your LLM into GUI agent
See, read, and reasonβbetter together.
Generate text and segment images using PaliGemma 2
Interact with the Aya family of models.
interact with videos !
Classify images in real-time using your webcam
OCR for PDFs and Images using Mistral OCR
Upload an image to detect objects
Object Detection & Scene Understanding for Images and Video
Describe masked regions in an image with natural language
Object Detection on Images and Video
Ask your webcam questions and get answers
Seed1.5-VL API Demo
Demo for Nanonets-OCR
Chat with images, videos, or PDFs to generate text
THUDM/GLM-4.1V-9B-Thinking Demo
Generate text responses from images and text input
Extract and visualize layout from PDFs or images