A large-scale synthetic Arabic OCR dataset comprising 843,622 book-style document images across 10 fonts, designed to advance VLM for Arabic Texts
Robotics and Internet-of-Things
riotu-lab
AI & ML interests
None yet
Recent Activity
updated a dataset about 11 hours ago
riotu-lab/MURAD updated a dataset 1 day ago
riotu-lab/SARD updated a dataset 1 day ago
riotu-lab/SARD-ExtendedOrganizations
None yet
SARD: Synthetic Arabic Recognition Dataset
A large-scale synthetic Arabic OCR dataset comprising 843,622 book-style document images across 10 fonts, designed to advance VLM for Arabic Texts
Aranizer | Arabic Tokenization with SentencePiece & PBE
Collection of Arabic Tokenizers with different sizes based on SentencePiece & PBE Encodings suitable for training LLMs
models 21
riotu-lab/my_awesome_model
Updated
riotu-lab/MAHA-Inst-V2
Text Generation • 2B • Updated
riotu-lab/ArabianGPT-1.5B-FT-SA-v2
2B • Updated • 2
riotu-lab/Aranizer-PBE-64k
Updated • 1
riotu-lab/Aranizer-SP-32k
Updated • 2
riotu-lab/Aranizer-SP-64k
Updated • 2
riotu-lab/Aranizer-SP-86k
Updated
riotu-lab/Aranizer-PBE-32k
Updated • 1
riotu-lab/Aranizer-PBE-86k
Updated • 1
riotu-lab/ArabianGPT-0.8B-Sum-FT
0.8B • Updated
datasets 16
riotu-lab/MURAD
Viewer • Updated • 96.2k • 57 • 1
riotu-lab/SARD-Extended
Preview • Updated • 250 • 7
riotu-lab/SARD
Preview • Updated • 6.96k • 13
riotu-lab/tashkeel-arabic-sentences
Viewer • Updated • 273k • 38
riotu-lab/ARCADE-full
Viewer • Updated • 6.91k • 288 • 5
riotu-lab/all_RD_datasets
Viewer • Updated • 342k • 16
riotu-lab/arabic_reverse_dictionary
Viewer • Updated • 58.6k • 113 • 4
riotu-lab/os-rfodg-outdoor-uav-synthetic-dataset-taif-saudi-arabia
Viewer • Updated • 102k • 78 • 1
riotu-lab/Os-rfodg-outdoor-uav-dataset-taif-saudi-arabia
Updated • 4
riotu-lab/ADMD
Viewer • Updated • 980 • 179 • 1