RL - a thangtm Collection

thangtm 's Collections

robot

data

flow_matching_model

reasoning_model

DLM

RL

ARC

RAG

Reduce_thinking

OCR

RL

updated 1 day ago

Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

Paper • 2510.20150 • Published Oct 23, 2025 • 4
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Paper • 2508.10433 • Published Aug 14, 2025 • 144
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 99
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published Nov 27, 2025 • 88
GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published 17 days ago • 28
Controlled Self-Evolution for Algorithmic Code Optimization

Paper • 2601.07348 • Published 4 days ago • 101