Latest News

  • [2026-01-12]🚀🚀🚀 We have open-sourced AgentCPM-Explore, an agent foundation model with only 4B parameters, together with its entire training and inference infrastructure. AgentCPM-Explore has successfully entered 8 classic long-horizon agent benchmarks, including GAIA,HLE, and BrowserComp. AgentCPM-Explore achieves SOTA performance at the same parameter scale and demonstrates its accurate deep research capabilities, effectively breaking the performance bottleneck for on-device agents.

Overview

Key highlights of AgentCPM-Explore include:

  • The first full-parameter 4B agent model to rank on 8 long-horizon and complex agent benchmarks, including GAIA, HLE, and BrowserComp, in the on-device setting.

  • Capable of over 100 rounds of continuous environment interaction, supporting multi-source information cross-validation, dynamic search strategy adjustment, and real-time verification of up-to-date information, enabling sustained deep exploration until task completion.

  • Fully open-sourced end-to-end, including (1) AgentRL, a fully asynchronous reinforcement learning framework for agent training, (2) AgentDock, a unified management and scheduling platform for tool sandboxes, (3) AgentToLeaP, a one-click evaluation platform for agent tool-learning capabilities. These components collectively support community collaboration and custom extensibility.

We elaborate on the entire construction pipeline of AgentCPM-Explore on GitHub.

Experimental Results

Model GAIA (text-only) BrowseComp BrowseComp (ZH) HLE Frames WebWalker Seal-0 Xbench-DeepSearch
Closed-Source Models
Claude-4.5-sonnet 71.2% 19.6% 40.8% 24.5% 85.0% / 53.4% 66.0%
Gemini Deep Research / / / 26.9% / / / /
DeepSeek-V3.2 63.5% 67.6% 65.0% 40.8% 80.2% / 38.5% 71.0%
MiniMax-M2 75.7% 44.0% 48.5% 31.8% / / / 72.0%
OpenAI-GPT-5-high 76.4% 54.9% 65.0% 35.2% / / 51.4% 77.8%
GLM-4.6 71.9% 45.1% 49.5% 30.4% / / / 70.0%
Kimi-Researcher / / / 26.9% 78.8% / 36.0% 69.0%
Seed-1.8 87.4% 67.6% 81.3% 40.9% / / / /
Open-Source Models
MiroThinker 8B 66.4% 31.1% 40.2% 21.5% 80.6% 60.6% 40.4% 60.6%
Tongyi DeepResearch 30B 70.9% 43.4% 46.7% 32.9% 90.6% 72.2% / 75.0%
ASearcher QWQ 32B v2 58.7% / / / 74.5% / / 51.1%
iterresearch-30B-A3B 72.8% 37.3% 45.2% 28.8% 71.0% / 39.6% /
WebSailor-V2-30B-A3B (RL) 74.1% 35.3% 44.1% 30.6% / / / 73.7%
WebLeaper-30B-A3B-RUC 73.2% 38.8% / / / / 48.6% 72.0%
WebDancer (QWQ-32B) 51.5% 3.8% 18.0% / / 47.9% / 38.3%
AgentCPM-Explore 4B 63.9% 25.0% 29.0% 19.1% 82.7% 68.1% 40.0% 70.0%
Downloads last month
77
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for openbmb/AgentCPM-Explore

Finetuned
(164)
this model
Quantizations
4 models