🛡️ XLM-RoBERTa Hate Speech Detector (EN/RU)

Multilingual toxic comment classification model fine-tuned on English and Russian datasets.

Model Description

  • Base Model: xlm-roberta-base
  • Languages: English, Russian
  • Task: Binary text classification (non-toxic / toxic)
  • Training: Fine-tuned on Davidson English + Russian Toxic Comments datasets

Performance

Overall Metrics

  • Macro F1: 0.925
  • Accuracy: 0.933

Language-Specific Performance

English:

  • Macro F1: 0.900
  • Non-toxic F1: 0.831
  • Toxic F1: 0.968
  • FPR: 0.220
  • FNR: 0.019

Russian:

  • Macro F1: 0.900
  • Non-toxic F1: 0.930
  • Toxic F1: 0.871
  • FPR: 0.094
  • FNR: 0.082

Usage

import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="Anchar21/Hate-Speech-Detector-XLM-RoBERTa",
    filename="model.pt"
)

# Load model architecture (you need BertClassifier class)
from your_module import BertClassifier

model = BertClassifier(
    model_name="xlm-roberta-base",
    num_labels=2,
    dropout=0.1
)

# Load weights
checkpoint = torch.load(model_path, map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")

# Inference
text = "Your text here"
encoding = tokenizer(
    text,
    max_length=128,
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)

with torch.no_grad():
    logits = model(encoding['input_ids'], encoding['attention_mask'])
    probs = torch.softmax(logits, dim=1)
    pred = torch.argmax(logits, dim=1)

print(f"Prediction: {'non-toxic', 'toxic'}[pred.item()]")
print(f"Confidence: {probs[0][pred].item():.3f}")

Training Details

  • Learning Rate: 1e-5
  • Batch Size: 16
  • Epochs: 3
  • Class Weights: True
  • Max Sequence Length: 128

Limitations

  • English texts show higher false positive rate (~22%) - model is aggressive on borderline cases
  • Trained on specific datasets - may not generalize to all domains
  • Binary classification only (no severity levels)

Citation

@misc{xlm-roberta-hate-speech-en-ru,
  author = {Anchar21},
  title = {XLM-RoBERTa Hate Speech Detector},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Anchar21/Hate-Speech-Detector-XLM-RoBERTa}}
}

License

MIT

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Anchar21/Hate-Speech-Detector-XLM-RoBERTa

Finetuned
(3797)
this model

Evaluation results