Papers + RL/Reasoning
updated
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published
• 144
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published
• 26
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
• 2504.08600
• Published
• 33
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
• 2504.11343
• Published
• 19
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
• 2504.14870
• Published
• 35
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large
Language Models
Paper
• 2504.15716
• Published
• 12
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
• 2504.21776
• Published
• 59
DeepCritic: Deliberate Critique with Large Language Models
Paper
• 2505.00662
• Published
• 54
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
• 2505.07608
• Published
• 82
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
• 2505.09343
• Published
• 76
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
• 2505.12504
• Published
• 24
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via
Reinforcement Learning
Paper
• 2505.11896
• Published
• 58
Paper
• 2505.14674
• Published
• 37
One-RL-to-See-Them-All/Orsta-Data-47k
Updated
• 270
• 17
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
• 2505.18129
• Published
• 62
RL with KL penalties is better viewed as Bayesian inference
Paper
• 2205.11275
• Published
• 1
Asymptotics of Language Model Alignment
Paper
• 2404.01730
• Published
• 1
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied
Iterative Policy Optimization
Paper
• 2505.19000
• Published
• 42
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous
Concept Space
Paper
• 2505.15778
• Published
• 19
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Paper
• 2505.23762
• Published
• 45
Table-R1: Inference-Time Scaling for Table Reasoning
Paper
• 2505.23621
• Published
• 93
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
Comment on The Illusion of Thinking: Understanding the Strengths and
Limitations of Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.09250
• Published
• 27
Paper
• 2506.10910
• Published
• 66
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published
• 79
AutoTriton: Automatic Triton Programming with Reinforcement Learning in
LLMs
Paper
• 2507.05687
• Published
• 30
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
• 2507.10532
• Published
• 90
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published
• 88
osmosis-ai/Osmosis-Apply-1.7B
Text Generation
• 2B • Updated
• 29
• 95
Geometric-Mean Policy Optimization
Paper
• 2507.20673
• Published
• 32
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published
• 50
Training-Free Group Relative Policy Optimization
Paper
• 2510.08191
• Published
• 45
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published
• 32
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published
• 105
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published
• 40