Page to MD

v1v1d 's Collections

chart

OCR

Document Undestanding Models

Table Extraction

Captioning

Layout Detection

DocQA

VQA

Page to MD

Latex Extract

updated about 5 hours ago

A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR

Upvote

v1v1d/Arxiv_MD_v2_2k

Viewer • Updated Jun 24, 2024 • 3.04k • 7
v1v1d/Arxiv_MD_v2

Viewer • Updated Jun 24, 2024 • 14.2k • 15
v1v1d/Arxiv_MD_v1_1k

Viewer • Updated Jun 23, 2024 • 1.14k • 5
v1v1d/Arxiv_MD_v1

Viewer • Updated Jun 18, 2024 • 9.96k • 19
ClimatePolicyRadar/all-document-text-data

Viewer • Updated Oct 29, 2025 • 70.5M • 55 • 19
nz/arxiv-ocr-v0.2

Viewer • Updated Sep 19, 2024 • 160k • 100 • 11

Upvote

Collection guide
Browse collections