🛡️ XLM-RoBERTa Hate Speech Detector (EN/RU)

Multilingual toxic comment classification model fine-tuned on English and Russian datasets.

Model Description

Base Model: xlm-roberta-base
Languages: English, Russian
Task: Binary text classification (non-toxic / toxic)
Training: Fine-tuned on Davidson English + Russian Toxic Comments datasets

Performance

Overall Metrics

Macro F1: 0.925
Accuracy: 0.933

Language-Specific Performance

English:

Macro F1: 0.900
Non-toxic F1: 0.831
Toxic F1: 0.968
FPR: 0.220
FNR: 0.019

Russian:

Macro F1: 0.900
Non-toxic F1: 0.930
Toxic F1: 0.871
FPR: 0.094
FNR: 0.082

Usage

import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="Anchar21/Hate-Speech-Detector-XLM-RoBERTa",
    filename="model.pt"
)

# Load model architecture (you need BertClassifier class)
from your_module import BertClassifier

model = BertClassifier(
    model_name="xlm-roberta-base",
    num_labels=2,
    dropout=0.1
)

# Load weights
checkpoint = torch.load(model_path, map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")

# Inference
text = "Your text here"
encoding = tokenizer(
    text,
    max_length=128,
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)

with torch.no_grad():
    logits = model(encoding['input_ids'], encoding['attention_mask'])
    probs = torch.softmax(logits, dim=1)
    pred = torch.argmax(logits, dim=1)

print(f"Prediction: {'non-toxic', 'toxic'}[pred.item()]")
print(f"Confidence: {probs[0][pred].item():.3f}")

Training Details

Learning Rate: 1e-5
Batch Size: 16
Epochs: 3
Class Weights: True
Max Sequence Length: 128

Limitations

English texts show higher false positive rate (~22%) - model is aggressive on borderline cases
Trained on specific datasets - may not generalize to all domains
Binary classification only (no severity levels)

Citation

@misc{xlm-roberta-hate-speech-en-ru,
  author = {Anchar21},
  title = {XLM-RoBERTa Hate Speech Detector},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Anchar21/Hate-Speech-Detector-XLM-RoBERTa}}
}

License

MIT

Downloads last month: 4

Model tree for Anchar21/Hate-Speech-Detector-XLM-RoBERTa

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3797)

this model

Evaluation results

Macro F1
self-reported

0.925
Accuracy
self-reported

0.933