Instructions to use Helsinki-NLP/opus-mt-ar-en with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Helsinki-NLP/opus-mt-ar-en with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-ar-en")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-ar-en") model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-ar-en") - Inference
- Notebooks
- Google Colab
- Kaggle
Arabic-English translation benchmark: MSA vs dialectal performance
#10
by O96a - opened
Ran a quick benchmark on OPUS-MT Arabic-English with 9 test cases across formal, technical, and dialectal inputs.
Key findings:
- MSA (Modern Standard Arabic): Strong performance, 3โ14s latency
- Technical content: Handles ML/API terminology well, preserves code-switching
- Dialectal Arabic: Significant truncation โ Egyptian "ุฅุฒููุ ููู ุชู ุงู ุ" reduced to "I was gonna ask you something" (missed the greeting entirely)
- Sudanese "ูุง ุฒูู" outputted as "Hey, Zol" with untranslated term
Latency range: 0.4s (simple) to 13.5s (technical sentences)
For production Arabic NLP pipelines, OPUS-MT works well for MSA but dialectal preprocessing would improve coverage.
Has anyone tested this against NLLB-200 for Arabic dialect coverage?