HF Daily
updated
Open Data Synthesis For Deep Research
Paper
• 2509.00375
• Published
• 72
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL
Training
Paper
• 2509.03403
• Published
• 23
LMEnt: A Suite for Analyzing Knowledge in Language Models from
Pretraining Data to Representations
Paper
• 2509.03405
• Published
• 24
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement
Fine-Tuning of LLMs
Paper
• 2509.00930
• Published
• 5
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper
• 2509.03867
• Published
• 211
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published
• 76
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
Real Instructions?
Paper
• 2509.04292
• Published
• 58
Delta Activations: A Representation for Finetuned Large Language Models
Paper
• 2509.04442
• Published
• 7
Why Language Models Hallucinate
Paper
• 2509.04664
• Published
• 196
Set Block Decoding is a Language Model Inference Accelerator
Paper
• 2509.04185
• Published
• 54
Bootstrapping Task Spaces for Self-Improvement
Paper
• 2509.04575
• Published
• 6
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs
Paper
• 2509.04013
• Published
• 4
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
• 2509.06160
• Published
• 149
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
• 2509.06949
• Published
• 56
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
• 2509.06733
• Published
• 32
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM
Step-Provers
Paper
• 2509.06493
• Published
• 12
SFR-DeepResearch: Towards Effective Reinforcement Learning for
Autonomously Reasoning Single Agents
Paper
• 2509.06283
• Published
• 17
Test-Time Scaling in Reasoning Models Is Not Effective for
Knowledge-Intensive Tasks Yet
Paper
• 2509.06861
• Published
• 9
R^textbf{2AI}: Towards Resistant and Resilient AI in an
Evolving World
Paper
• 2509.06786
• Published
• 3
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published
• 105
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published
• 662
Staying in the Sweet Spot: Responsive Reasoning Evolution via
Capability-Adaptive Hint Scaffolding
Paper
• 2509.06923
• Published
• 22
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper
• 2509.03646
• Published
• 33
ΔL Normalization: Rethink Loss Aggregation in RLVR
Paper
• 2509.07558
• Published
• 7
From Noise to Narrative: Tracing the Origins of Hallucinations in
Transformers
Paper
• 2509.06938
• Published
• 5
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning
in Large Language Models
Paper
• 2509.09675
• Published
• 28
The Majority is not always right: RL training for solution aggregation
Paper
• 2509.06870
• Published
• 15
Statistical Methods in Generative AI
Paper
• 2509.07054
• Published
• 11
MachineLearningLM: Continued Pretraining Language Models on Millions of
Synthetic Tabular Prediction Tasks Scales In-Context ML
Paper
• 2509.06806
• Published
• 64
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in
LLMs
Paper
• 2509.09677
• Published
• 35
Paper
• 2509.10147
• Published
• 27
Single-stream Policy Optimization
Paper
• 2509.13232
• Published
• 34
EconProver: Towards More Economical Test-Time Scaling for Automated
Theorem Proving
Paper
• 2509.12603
• Published
• 9
Towards General Agentic Intelligence via Environment Scaling
Paper
• 2509.13311
• Published
• 72
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via
Machine Unlearning
Paper
• 2509.13755
• Published
• 19
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical
Reasoning
Paper
• 2509.13761
• Published
• 16
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
• 2509.15207
• Published
• 116
Reasoning over Boundaries: Enhancing Specification Alignment via
Test-time Delibration
Paper
• 2509.14760
• Published
• 53
Evolving Language Models without Labels: Majority Drives Selection,
Novelty Promotes Variation
Paper
• 2509.15194
• Published
• 33
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
• 2509.15591
• Published
• 45
LIMI: Less is More for Agency
Paper
• 2509.17567
• Published
• 104
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric
Reasoning
Paper
• 2509.17437
• Published
• 17
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Paper
• 2509.16117
• Published
• 22
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from
Token and Parameter Levels
Paper
• 2509.16596
• Published
• 14
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning
Paper
• 2509.18083
• Published
• 5
Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with
LLMs
Paper
• 2509.17998
• Published
• 1
Reinforcement Learning on Pre-Training Data
Paper
• 2509.19249
• Published
• 67
MAPO: Mixed Advantage Policy Optimization
Paper
• 2509.18849
• Published
• 27
What Characterizes Effective Reasoning? Revisiting Length, Review, and
Structure of CoT
Paper
• 2509.19284
• Published
• 23
SIM-CoT: Supervised Implicit Chain-of-Thought
Paper
• 2509.20317
• Published
• 42
EmbeddingGemma: Powerful and Lightweight Text Representations
Paper
• 2509.20354
• Published
• 48
Video models are zero-shot learners and reasoners
Paper
• 2509.20328
• Published
• 100
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just
What They Say
Paper
• 2509.21164
• Published
• 9
VCRL: Variance-based Curriculum Reinforcement Learning for Large
Language Models
Paper
• 2509.19803
• Published
• 120
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
Paper
• 2509.21320
• Published
• 101
Tree Search for LLM Agent Reinforcement Learning
Paper
• 2509.21240
• Published
• 92
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy
Optimization in Reinforcement Learning
Paper
• 2509.20712
• Published
• 19
Thinking Augmented Pre-training
Paper
• 2509.20186
• Published
• 23
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical
Reasoning
Paper
• 2509.21070
• Published
• 9
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
• 2509.22576
• Published
• 135
Quantile Advantage Estimation for Entropy-Safe Reasoning
Paper
• 2509.22611
• Published
• 118
Variational Reasoning for Language Models
Paper
• 2509.22637
• Published
• 69
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper
• 2509.22638
• Published
• 70
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper
• 2509.21880
• Published
• 53
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model
Reasoning
Paper
• 2509.19894
• Published
• 34
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion
Models
Paper
• 2509.22300
• Published
• 4
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable
Sparse-Linear Attention
Paper
• 2509.24006
• Published
• 118
Multiplayer Nash Preference Optimization
Paper
• 2509.23102
• Published
• 62
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach
for LLM Reasoning in RLVR
Paper
• 2509.23808
• Published
• 47
Sequential Diffusion Language Models
Paper
• 2509.24007
• Published
• 46
When Does Reasoning Matter? A Controlled Study of Reasoning's
Contribution to Model Performance
Paper
• 2509.22193
• Published
• 38
SparseD: Sparse Attention for Diffusion Language Models
Paper
• 2509.24014
• Published
• 31
Random Policy Valuation is Enough for LLM Reasoning with Verifiable
Rewards
Paper
• 2509.24981
• Published
• 29
The Era of Real-World Human Interaction: RL from User Conversations
Paper
• 2509.25137
• Published
• 19
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference
Learning
Paper
• 2509.23285
• Published
• 14
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient
Chain-of-Thought Training
Paper
• 2509.24494
• Published
• 11
The Dragon Hatchling: The Missing Link between the Transformer and
Models of the Brain
Paper
• 2509.26507
• Published
• 547
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper
• 2509.25760
• Published
• 55
Thinking-Free Policy Initialization Makes Distilled Reasoning Models
More Effective and Efficient Reasoners
Paper
• 2509.26226
• Published
• 34
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During
Post Training
Paper
• 2509.25758
• Published
• 23
Mem-α: Learning Memory Construction via Reinforcement Learning
Paper
• 2509.25911
• Published
• 15
Attention as a Compass: Efficient Exploration for Process-Supervised RL
in Reasoning Models
Paper
• 2509.26628
• Published
• 17
InfoAgent: Advancing Autonomous Information-Seeking Agents
Paper
• 2509.25189
• Published
• 13
Benefits and Pitfalls of Reinforcement Learning for Language Model
Planning: A Theoretical Perspective
Paper
• 2509.22613
• Published
• 10
Specialization after Generalization: Towards Understanding Test-Time
Training in Foundation Models
Paper
• 2509.24510
• Published
• 5
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published
• 146
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published
• 90
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
• 2509.25849
• Published
• 48
It Takes Two: Your GRPO Is Secretly DPO
Paper
• 2510.00977
• Published
• 32
ACON: Optimizing Context Compression for Long-horizon LLM Agents
Paper
• 2510.00615
• Published
• 34
BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper
• 2510.01180
• Published
• 20
Making, not Taking, the Best of N
Paper
• 2510.00931
• Published
• 10
CurES: From Gradient Analysis to Efficient Curriculum Learning for
Reasoning LLMs
Paper
• 2510.01037
• Published
• 2
LongCodeZip: Compress Long Context for Code Language Models
Paper
• 2510.00446
• Published
• 107
ExGRPO: Learning to Reason from Experience
Paper
• 2510.02245
• Published
• 80
Interactive Training: Feedback-Driven Neural Network Optimization
Paper
• 2510.02297
• Published
• 43
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published
• 44
Aristotle: IMO-level Automated Theorem Proving
Paper
• 2510.01346
• Published
• 17
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning
Problems
Paper
• 2510.02263
• Published
• 9
Paper
• 2510.01141
• Published
• 121
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published
• 59
Self-Improvement in Multimodal Large Language Models: A Survey
Paper
• 2510.02665
• Published
• 21
Continuously Augmented Discrete Diffusion model for Categorical
Generative Modeling
Paper
• 2510.01329
• Published
• 6
Pretraining with hierarchical memories: separating long-tail and common
knowledge
Paper
• 2510.02375
• Published
• 6
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
Paper
• 2510.01132
• Published
• 6
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
• 2510.04618
• Published
• 129
Paper2Video: Automatic Video Generation from Scientific Papers
Paper
• 2510.05096
• Published
• 119
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
Information
Paper
• 2510.03632
• Published
• 42
Hybrid Architectures for Language Models: Systematic Analysis and Design
Insights
Paper
• 2510.04800
• Published
• 37
Front-Loading Reasoning: The Synergy between Pretraining and
Post-Training Data
Paper
• 2510.03264
• Published
• 25
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published
• 107
MixReasoning: Switching Modes to Think
Paper
• 2510.06052
• Published
• 22
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model
Reasoning
Paper
• 2510.04081
• Published
• 23
Cache-to-Cache: Direct Semantic Communication Between Large Language
Models
Paper
• 2510.03215
• Published
• 98
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal
Generation and Understanding
Paper
• 2510.06308
• Published
• 55
Ming-UniVision: Joint Image Understanding and Generation with a Unified
Continuous Tokenizer
Paper
• 2510.06590
• Published
• 77
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published
• 31
Agent Learning via Early Experience
Paper
• 2510.08558
• Published
• 273
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
• 2510.03259
• Published
• 57
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper
• 2510.07499
• Published
• 48
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published
• 75
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published
• 181
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published
• 166
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Paper
• 2510.10201
• Published
• 36
Demystifying Reinforcement Learning in Agentic Reasoning
Paper
• 2510.11701
• Published
• 33
Don't Just Fine-tune the Agent, Tune the Environment
Paper
• 2510.10197
• Published
• 30
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic
Tasks
Paper
• 2510.12635
• Published
• 17
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm
Enables Fine-Grained Policy Optimization
Paper
• 2510.13554
• Published
• 58
Stronger Together: On-Policy Reinforcement Learning for Collaborative
LLMs
Paper
• 2510.11062
• Published
• 29
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate
Reasoning
Paper
• 2510.10494
• Published
• 2
Agentic Entropy-Balanced Policy Optimization
Paper
• 2510.14545
• Published
• 106
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Paper
• 2510.14943
• Published
• 40
Information Gain-based Policy Optimization: A Simple and Effective
Approach for Multi-Turn LLM Agents
Paper
• 2510.14967
• Published
• 34
LLMs Can Get "Brain Rot"!
Paper
• 2510.13928
• Published
• 23
LLM-guided Hierarchical Retrieval
Paper
• 2510.13217
• Published
• 21
Large Language Models Do NOT Really Know What They Don't Know
Paper
• 2510.09033
• Published
• 17