ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 7 days ago • 62
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 7 days ago • 62
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment Paper • 2601.01576 • Published 19 days ago • 17
OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding Paper • 2601.10343 • Published 8 days ago
Better Process Supervision with Bi-directional Rewarding Signals Paper • 2503.04618 • Published Mar 6, 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10, 2025 • 56
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published Oct 21, 2025 • 84
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80