arxiv:2602.09185

AIDev: Studying AI Coding Agents on GitHub

Published on Feb 9

· Submitted by

Leo on Feb 17

Upvote

Authors:

Hao Li ,

Abstract

AIDev is a large-scale dataset of agent-authored pull requests from real-world GitHub repositories that captures AI coding agent usage in practical software development scenarios.

AI-generated summary

AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering. > AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering

View arXiv page View PDF Project page Add to collection

Community

hao-li

Paper author Paper submitter 1 day ago

AIDev is a dataset (https://huggingface.co/datasets/hao-li/AIDev) capturing agent-authored pull requests (Agentic-PRs) from real-world GitHub repositories:

Scale: 932,791 Agentic-PRs
Breadth: 116,211 repositories and 72,189 developers, across five AI agents (Claude Code, Cursor, Devin, GitHub Copilot, OpenAI Codex)
Depth: 33,596 curated Agentic-PRs from 2,807 popular repositories (over 100 stars), enriched with comments, reviews, commits, and related issues

hao-li

Paper author Paper submitter 1 day ago

If you are interested, you can also check our first paper (https://arxiv.org/abs/2507.15003) and 70+ papers using the AIDev dataset (https://huggingface.co/datasets/hao-li/AIDev#papers-using-aidev)

librarian-bot

about 16 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.09185 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.09185 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.