-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 494 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 139
Collections
Discover the best community collections!
Collections including paper arxiv:2510.08558
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Paper • 2511.17592 • Published • 118
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 121
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 72 -
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Paper • 2509.23768 • Published • 48 -
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions
Paper • 2510.08211 • Published • 22
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
Paper • 2510.08002 • Published • 23 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
The Denario project: Deep knowledge AI agents for scientific discovery
Paper • 2510.26887 • Published • 6
-
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper • 2509.22638 • Published • 70 -
Don't Just Fine-tune the Agent, Tune the Environment
Paper • 2510.10197 • Published • 28 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 125 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 494 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 139
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 121
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 72 -
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Paper • 2509.23768 • Published • 48 -
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions
Paper • 2510.08211 • Published • 22
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
Paper • 2510.08002 • Published • 23 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
The Denario project: Deep knowledge AI agents for scientific discovery
Paper • 2510.26887 • Published • 6
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper • 2509.22638 • Published • 70 -
Don't Just Fine-tune the Agent, Tune the Environment
Paper • 2510.10197 • Published • 28 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 125 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Paper • 2511.17592 • Published • 118
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4