Image-Text-to-Text
Transformers
PyTorch
English
Chinese
multilingual
vision-encoder-decoder
Image-to-Text
OCR
Image-Captioning
Text-Recognition
Instructions to use priyank-m/m_OCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use priyank-m/m_OCR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="priyank-m/m_OCR")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("priyank-m/m_OCR") model = AutoModelForImageTextToText.from_pretrained("priyank-m/m_OCR") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use priyank-m/m_OCR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "priyank-m/m_OCR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "priyank-m/m_OCR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/priyank-m/m_OCR
- SGLang
How to use priyank-m/m_OCR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "priyank-m/m_OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "priyank-m/m_OCR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "priyank-m/m_OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "priyank-m/m_OCR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use priyank-m/m_OCR with Docker Model Runner:
docker model run hf.co/priyank-m/m_OCR
updated tag
Browse files
README.md
CHANGED
|
@@ -7,6 +7,7 @@ tags:
|
|
| 7 |
- Image-to-Text
|
| 8 |
- OCR
|
| 9 |
- Image-Captioning
|
|
|
|
| 10 |
datasets:
|
| 11 |
- priyank-m/text_recognition_en_zh_clean
|
| 12 |
metrics:
|
|
@@ -36,12 +37,4 @@ Notes and observations:
|
|
| 36 |
12. Streaming dataset might be another good option if the dataset size were to increase any further.
|
| 37 |
13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
|
| 38 |
14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
|
| 39 |
-
15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
|
|
|
| 7 |
- Image-to-Text
|
| 8 |
- OCR
|
| 9 |
- Image-Captioning
|
| 10 |
+
- Text-Recognition
|
| 11 |
datasets:
|
| 12 |
- priyank-m/text_recognition_en_zh_clean
|
| 13 |
metrics:
|
|
|
|
| 37 |
12. Streaming dataset might be another good option if the dataset size were to increase any further.
|
| 38 |
13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
|
| 39 |
14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
|
| 40 |
+
15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|