DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published 10 days ago • 65
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published Oct 9 • 26
OpenCUA: Open Foundations for Computer-Use Agents Collection This is the official versions of OpenCUA models and AgentNet datasets. Website: https://opencua.xlang.ai/ • 8 items • Updated 7 days ago • 21
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25 • 32
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 45
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning Paper • 2410.14208 • Published Oct 18, 2024 • 3
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 Mar 20, 2024 • 105
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 50