Abstract
Agent-RRM, a multi-faceted reward model, provides structured feedback for agentic trajectories through reasoning traces, critiques, and performance scores, with unified feedback integration showing superior performance across diverse benchmarks.
Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still relies on sparse outcome-based reward for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results. In this paper, we introduce Agent Reasoning Reward Model (Agent-RRM), a multi-faceted reward model that produces structured feedback for agentic trajectories, including (1) an explicit reasoning trace , (2) a focused critique that provides refinement guidance by highlighting reasoning flaws, and (3) an overall score that evaluates process performance. Leveraging these signals, we systematically investigate three integration strategies: Reagent-C (text-augmented refinement), Reagent-R (reward-augmented guidance), and Reagent-U (unified feedback integration). Extensive evaluations across 12 diverse benchmarks demonstrate that Reagent-U yields substantial performance leaps, achieving 43.7% on GAIA and 46.2% on WebWalkerQA, validating the effectiveness of our reasoning reward model and training schemes. Code, models, and datasets are all released to facilitate future research.
Community
arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/exploring-reasoning-reward-model-for-agents-8347-94301ed6
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent (2025)
- Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning (2026)
- Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization (2025)
- CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning (2026)
- MagicGUI-RMS: A Multi-Agent Reward Model System for Self-Evolving GUI Agents via Automated Feedback Reflux (2026)
- AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning (2025)
- BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 4
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper