🛡️ XLM-RoBERTa Hate Speech Detector (EN/RU)
Multilingual toxic comment classification model fine-tuned on English and Russian datasets.
Model Description
- Base Model:
xlm-roberta-base - Languages: English, Russian
- Task: Binary text classification (non-toxic / toxic)
- Training: Fine-tuned on Davidson English + Russian Toxic Comments datasets
Performance
Overall Metrics
- Macro F1: 0.925
- Accuracy: 0.933
Language-Specific Performance
English:
- Macro F1: 0.900
- Non-toxic F1: 0.831
- Toxic F1: 0.968
- FPR: 0.220
- FNR: 0.019
Russian:
- Macro F1: 0.900
- Non-toxic F1: 0.930
- Toxic F1: 0.871
- FPR: 0.094
- FNR: 0.082
Usage
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="Anchar21/Hate-Speech-Detector-XLM-RoBERTa",
filename="model.pt"
)
# Load model architecture (you need BertClassifier class)
from your_module import BertClassifier
model = BertClassifier(
model_name="xlm-roberta-base",
num_labels=2,
dropout=0.1
)
# Load weights
checkpoint = torch.load(model_path, map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
# Inference
text = "Your text here"
encoding = tokenizer(
text,
max_length=128,
padding='max_length',
truncation=True,
return_tensors='pt'
)
with torch.no_grad():
logits = model(encoding['input_ids'], encoding['attention_mask'])
probs = torch.softmax(logits, dim=1)
pred = torch.argmax(logits, dim=1)
print(f"Prediction: {'non-toxic', 'toxic'}[pred.item()]")
print(f"Confidence: {probs[0][pred].item():.3f}")
Training Details
- Learning Rate: 1e-5
- Batch Size: 16
- Epochs: 3
- Class Weights: True
- Max Sequence Length: 128
Limitations
- English texts show higher false positive rate (~22%) - model is aggressive on borderline cases
- Trained on specific datasets - may not generalize to all domains
- Binary classification only (no severity levels)
Citation
@misc{xlm-roberta-hate-speech-en-ru,
author = {Anchar21},
title = {XLM-RoBERTa Hate Speech Detector},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Anchar21/Hate-Speech-Detector-XLM-RoBERTa}}
}
License
MIT
- Downloads last month
- 4
Model tree for Anchar21/Hate-Speech-Detector-XLM-RoBERTa
Base model
FacebookAI/xlm-roberta-baseEvaluation results
- Macro F1self-reported0.925
- Accuracyself-reported0.933