IWSLT/ted_talks_iwslt
Updated β’ 672 β’ 24
How to use dhintech/marian-tedtalks-id-en-enhanced with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "translation" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("translation", model="dhintech/marian-tedtalks-id-en-enhanced") # Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("dhintech/marian-tedtalks-id-en-enhanced")
model = AutoModelForSeq2SeqLM.from_pretrained("dhintech/marian-tedtalks-id-en-enhanced")This model is an enhanced fine-tuned version of Helsinki-NLP/opus-mt-id-en with domain-specific adaptation for meeting and business contexts.
| Metric | Base Model | This Model | Improvement |
|---|---|---|---|
| BLEU Score | 9.146 | 11.747 | +28.4% |
| Translation Speed | 1.2s | 0.12s | -90.0% |
| Meeting Context | Standard | Enhanced | Domain Adapted |
from transformers import MarianMTModel, MarianTokenizer
# Load model and tokenizer
model_name = "dhintech/marian-tedtalks-id-en-enhanced"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Translate Indonesian to English
def translate(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model.generate(
**inputs,
max_length=128,
num_beams=3,
early_stopping=True,
do_sample=False
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
indonesian_text = "Tim marketing akan bertanggung jawab untuk strategi ini."
english_translation = translate(indonesian_text)
print(english_translation)
# Output: "The marketing team will be responsible for this strategy."
| Indonesian | English | Context |
|---|---|---|
| Selamat pagi semuanya, mari kita mulai rapat hari ini. | Good morning everyone, let's start today's meeting. | Meeting Opening |
| Tim marketing akan bertanggung jawab untuk strategi ini. | The marketing team will be responsible for this strategy. | Task Assignment |
| Database migration sudah selesai dan berjalan dengan lancar. | Database migration is complete and running smoothly. | Technical Update |
| Budget yang disetujui adalah 500 juta rupiah. | The approved budget is 500 million rupiah. | Financial Discussion |
@misc{enhanced-marian-id-en-2025,
title={Enhanced MarianMT Indonesian-English Translation (Meeting Domain Adaptation)},
author={DhinTech},
year={2025},
publisher={Hugging Face},
journal={Hugging Face Model Hub},
howpublished={\url{https://huggingface.co/dhintech/marian-tedtalks-id-en-enhanced}},
note={Enhanced with TEDTalks and meeting-specific domain adaptation}
}
This model is specifically enhanced for Indonesian business meeting translation scenarios with domain adaptation techniques.
Base model
Helsinki-NLP/opus-mt-id-en