-
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Paper • 2604.02268 • Published • 99 -
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning
Paper • 2603.05863 • Published • 6 -
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning
Paper • 2604.02721 • Published • 375 -
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 149
Collections
Discover the best community collections!
Collections including paper arxiv:2603.14473
-
LongCat-Flash-Thinking-2601 Technical Report
Paper • 2601.16725 • Published • 180 -
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 521 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 426
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 135 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 142 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 65 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 162
-
Monitored Markov Decision Processes
Paper • 2402.06819 • Published -
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Paper • 2505.08988 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Sotopia-RL: Reward Design for Social Intelligence
Paper • 2508.03905 • Published • 23
-
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper • 2603.04597 • Published • 210 -
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation • 8B • Updated • 37 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Paper • 2509.25779 • Published • 19
-
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
Seriki/FastHTML
Updated • 6 • 1 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 426
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 323 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 177
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Paper • 2604.02268 • Published • 99 -
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning
Paper • 2603.05863 • Published • 6 -
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning
Paper • 2604.02721 • Published • 375 -
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 149
-
Monitored Markov Decision Processes
Paper • 2402.06819 • Published -
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Paper • 2505.08988 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Sotopia-RL: Reward Design for Social Intelligence
Paper • 2508.03905 • Published • 23
-
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper • 2603.04597 • Published • 210 -
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation • 8B • Updated • 37 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Paper • 2509.25779 • Published • 19
-
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
Seriki/FastHTML
Updated • 6 • 1 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 426
-
LongCat-Flash-Thinking-2601 Technical Report
Paper • 2601.16725 • Published • 180 -
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 521 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 426
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 323 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 177
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 135 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 142 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 65 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 162
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25