view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 890
view article Article SigLIP 2: A better multilingual vision language encoder +1 ariG23498, merve, qubvel-hf • Feb 21, 2025 • 211
view article Article Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR nvidia • Jan 5 • 86
view article Article Open Responses: What you need to know +2 evalstate, burtenshaw, merve, pcuenq • Jan 15 • 111
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 itazap, ariG23498, ArthurZ, sergiopaniego, merve, pcuenq • Dec 18, 2025 • 124
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 106
view article Article Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers sanchit-gandhi • Nov 3, 2022 • 371
view article Article Supercharge your OCR Pipelines with Open Models +5 merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq • Oct 21, 2025 • 309
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published May 12, 2025 • 136
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates Paper • 2509.09550 • Published Sep 11, 2025 • 4
PP-StructureV3 Collection PP-StructureV3 is a SOTA document parsing solution on OmniDocBench, supporting the conversion of PDFs and do cument images to Markdown and JSON. • 17 items • Updated Sep 15, 2025 • 17
PP-OCRv5 Collection PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15, 2025 • 56
AFM-Datasets Collection Training datasets for OPPO Personal AI Lab’s family of Agent Foundation Models. • 6 items • Updated Feb 4 • 6