Abstract
AIDev is a large-scale dataset of agent-authored pull requests from real-world GitHub repositories that captures AI coding agent usage in practical software development scenarios.
AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering. > AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering
Community
AIDev is a dataset (https://huggingface.co/datasets/hao-li/AIDev) capturing agent-authored pull requests (Agentic-PRs) from real-world GitHub repositories:
- Scale: 932,791 Agentic-PRs
- Breadth: 116,211 repositories and 72,189 developers, across five AI agents (Claude Code, Cursor, Devin, GitHub Copilot, OpenAI Codex)
- Depth: 33,596 curated Agentic-PRs from 2,807 popular repositories (over 100 stars), enriched with comments, reviews, commits, and related issues
If you are interested, you can also check our first paper (https://arxiv.org/abs/2507.15003) and 70+ papers using the AIDev dataset (https://huggingface.co/datasets/hao-li/AIDev#papers-using-aidev)
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests (2026)
- A Task-Level Evaluation of AI Agents in Open-Source Projects (2026)
- Why Agentic-PRs Get Rejected: A Comparative Study of Coding Agents (2026)
- Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests (2026)
- On Autopilot? An Empirical Study of Human-AI Teaming and Review Practices in Open Source (2026)
- Do Autonomous Agents Contribute Test Code? A Study of Tests in Agentic Pull Requests (2026)
- Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper