Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
v1v1d 's Collections
chart
OCR
Document Undestanding Models
Table Extraction
Captioning
Layout Detection
DocQA
VQA
Page to MD
Latex Extract

Page to MD

updated Dec 13, 2024

A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR

Upvote
-

  • v1v1d/Arxiv_MD_v2_2k

    Viewer • Updated Jun 24, 2024 • 3.04k • 51

  • v1v1d/Arxiv_MD_v2

    Viewer • Updated Jun 24, 2024 • 14.2k • 23

  • v1v1d/Arxiv_MD_v1_1k

    Viewer • Updated Jun 23, 2024 • 1.14k • 20

  • v1v1d/Arxiv_MD_v1

    Viewer • Updated Jun 18, 2024 • 9.96k • 44

  • ClimatePolicyRadar/all-document-text-data

    Viewer • Updated Oct 29 • 70.5M • 238 • 19

  • nz/arxiv-ocr-v0.2

    Viewer • Updated Sep 19, 2024 • 160k • 1.27k • 11
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs