InternVideo3 enhances long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms
AI & ML interests
Computer Vision
Recent Activity
Papers
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction
RIVER: A Real-Time Interaction Benchmark for Video LLMs
Organization Card
OpenGVLab
Welcome to OpenGVLab! We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.
Models
- InternVL: a pioneering open-source alternative to GPT-4V.
- InternImage: a large-scale vision foundation models with deformable convolutions.
- InternVideo: large-scale video foundation models for multimodal understanding.
- VideoChat: an end-to-end chat assistant for video comprehension.
- All-Seeing-Project: towards panoptic visual recognition and understanding of the open world.
Datasets
- ShareGPT4o: a groundbreaking large-scale resource that we plan to open-source with 200K meticulously annotated images, 10K videos with highly descriptive captions, and 10K audio files with detailed descriptions.
- InternVid: a large-scale video-text dataset for multimodal understanding and generation.
- MMPR: a high-quality, large-scale multimodal preference dataset.
Benchmarks
- MVBench: a comprehensive benchmark for multimodal video understanding.
- CRPE: a benchmark covering all elements of the relation triplets (subject, predicate, object), providing a systematic platform for the evaluation of relation comprehension ability.
- MM-NIAH: a comprehensive benchmark for long multimodal documents comprehension.
- GMAI-MMBench: a comprehensive multimodal evaluation benchmark towards general medical AI.
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
-
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Paper • 2512.01342 • Published • 21 -
revliter/internvideo_next_base_p14_res224_f16
91M • Updated • 625 • 5 -
revliter/internvideo_next_large_p14_res224_f16
0.3B • Updated • 6.76k • 7 -
revliter/internvideo_next_large_p14_res224_f16_stage1
Updated • 11 • 2
InternVideo3 enhances long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
-
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Paper • 2512.01342 • Published • 21 -
revliter/internvideo_next_base_p14_res224_f16
91M • Updated • 625 • 5 -
revliter/internvideo_next_large_p14_res224_f16
0.3B • Updated • 6.76k • 7 -
revliter/internvideo_next_large_p14_res224_f16_stage1
Updated • 11 • 2
spaces 13
Sleeping
4
ScaleCUA Demo
📚
Display web content in a Streamlit app
Runtime error
Agents
1
InternVideo2.5
💬
Hierarchical Compression for Long-Context Video Modeling
Running
Agents
Featured
513
InternVL
⚡
Chat with an AI that understands images and text
Running
Agents
43
MVBench Leaderboard
🐨
Submit model evaluations and view the leaderboard
Build error
Agents
18
InternVideo2 Chat 8B HD
👁
Upload a video to chat about its contents
models 286
OpenGVLab/Mono-InternVL-2B
Image-Text-to-Text • 3B • Updated • 1.17k • 38
OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B
Video-Text-to-Text • 9B • Updated • 493 • 9
OpenGVLab/SDLM-32B-D4
Text Generation • 33B • Updated • 14 • 18
OpenGVLab/SDLM-3B-D4
Text Generation • 3B • Updated • 26 • 7
OpenGVLab/SDLM-3B-D8
Text Generation • 3B • Updated • 25 • 3
OpenGVLab/Vlaser-2B-VLA
Updated • 3
OpenGVLab/Vlaser-8B
8B • Updated • 604 • 2
OpenGVLab/Vlaser-2B
2B • Updated • 11 • 1
OpenGVLab/VeBrain
8B • Updated • 11
OpenGVLab/NaViL-9B
16B • Updated • 13 • 1
datasets 50
OpenGVLab/RIVER
Updated • 52
OpenGVLab/ExpVid
Preview • Updated • 530 • 7
OpenGVLab/GenExam
Updated • 385 • 5
OpenGVLab/ScaleCUA-Data
Preview • Updated • 2.65k • 31
OpenGVLab/VRBench
Preview • Updated • 557 • 5
OpenGVLab/MMPR-v1.2
Updated • 3.55k • 42
OpenGVLab/MMPR-Tiny
Updated • 108 • 9
OpenGVLab/MMPR-v1.2-prompts
Updated • 1.82k • 2
OpenGVLab/MMBench-GUI
Preview • Updated • 213 • 37
OpenGVLab/GUI-Odyssey
Viewer • Updated • 7.74k • 29.5k • 26