GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 241
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published 19 days ago • 42
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13 • 100
view article Article We’re open-sourcing our text-to-image model and the process behind it 26 days ago • 73
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published 30 days ago • 128
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published Oct 29 • 64
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 174
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models Paper • 2504.14032 • Published Apr 18 • 7
Baichuan-M2: Scaling Medical Capability with Large Verifier System Paper • 2509.02208 • Published Sep 2 • 42
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22 • 114
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model Paper • 2510.20803 • Published Oct 23 • 9
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs Paper • 2510.13795 • Published Oct 15 • 56