Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.17102

about 11 hours ago

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 7 days ago • 18

about 20 hours ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7 • 130
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Paper • 2508.04017 • Published Aug 6 • 11
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 7 days ago • 18

about 3 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 474 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

about 12 hours ago

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 27
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 42
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 54
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 269
DeepCode: Open Agentic Coding

Paper • 2512.07921 • Published 17 days ago • 31
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 7 days ago • 18

Business Datasets

wonseokchoi1/ai-consulting-dpo-dataset-v1

Viewer • Updated Aug 5, 2024 • 1.44k • 20 • 1
Tomasz332/building-consulting-dataset

Viewer • Updated Feb 15 • 100 • 13 • 1
airabbitX/gpt-consultant

Viewer • Updated Aug 20, 2024 • 1.1k • 56
jhsong01/startup_consulting_dataset

Viewer • Updated Oct 14, 2024 • 72 • 19

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

about 11 hours ago

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 7 days ago • 18

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 269
DeepCode: Open Agentic Coding

Paper • 2512.07921 • Published 17 days ago • 31
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 7 days ago • 18

about 20 hours ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7 • 130
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Paper • 2508.04017 • Published Aug 6 • 11
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 7 days ago • 18

Business Datasets

wonseokchoi1/ai-consulting-dpo-dataset-v1

Viewer • Updated Aug 5, 2024 • 1.44k • 20 • 1
Tomasz332/building-consulting-dataset

Viewer • Updated Feb 15 • 100 • 13 • 1
airabbitX/gpt-consultant

Viewer • Updated Aug 20, 2024 • 1.1k • 56
jhsong01/startup_consulting_dataset

Viewer • Updated Oct 14, 2024 • 72 • 19

about 3 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 474 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

about 12 hours ago

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 27
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 42
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 54
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs