sheikhjubair 's Collections reasoning-agentic
updated
Paper
• 2412.16720
• Published
• 37
LearnLM: Improving Gemini for Learning
Paper
• 2412.16429
• Published
• 22
NILE: Internal Consistency Alignment in Large Language Models
Paper
• 2412.16686
• Published
• 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
• 2412.16145
• Published
• 38
Paper
• 2412.15115
• Published
• 377
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward
Modeling
Paper
• 2412.15084
• Published
• 13
Xmodel-2 Technical Report
Paper
• 2412.19638
• Published
• 27
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
• 2503.16419
• Published
• 77
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
• 2503.16219
• Published
• 52
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical
Reasoning Models with OpenMathReasoning dataset
Paper
• 2504.16891
• Published
• 25
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
• 2504.16078
• Published
• 21
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published
• 49
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published
• 63
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and
Verifiable Mathematical Dataset for Advancing Reasoning
Paper
• 2504.11456
• Published
• 12
Reasoning Models Can Be Effective Without Thinking
Paper
• 2504.09858
• Published
• 12
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
Paper
• 2505.08311
• Published
• 19
Are Reasoning Models More Prone to Hallucination?
Paper
• 2505.23646
• Published
• 24
ATLAS: Learning to Optimally Memorize the Context at Test Time
Paper
• 2505.23735
• Published
• 23
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM
Reasoning
Paper
• 2505.20561
• Published
• 7
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
• 2508.10751
• Published
• 29
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published
• 76
SRFT: A Single-Stage Method with Supervised and Reinforcement
Fine-Tuning for Reasoning
Paper
• 2506.19767
• Published
• 15
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published
• 48