ViCoHATE-Adaptive: Vietnamese Hate Speech Detection with Domain Adaptation
ViCoHATE-Adaptive is a robust Vietnamese Hate Speech Detection model designed to handle cross-domain challenges (Reddit to Facebook). It leverages Generative Augmentation, Pseudo-Labeling, and Simulated Active Learning to achieve state-of-the-art performance with minimal manual labeling.
This model is fine-tuned from FacebookAI/xlm-roberta-base using the ViCoHATE dataset and a proposed adaptive framework.
Model Details
Model Description
- Developed by: Hoàng Thái Anh
- Model type: Transformers (XLM-RoBERTa)
- Language(s): Vietnamese
- License: Apache 2.0
- Finetuned from model: FacebookAI/xlm-roberta-base
- Framework: PyTorch, Transformers
Key Features
- Context-Aware: Inputs include
StanceandContext(Post/Parent Comment) to resolve ambiguity. - Cross-Domain Robustness: Trained on a hybrid dataset of Reddit (Source), Augmented Facebook Data (Qwen3-generated), and Real Facebook Data (Pseudo-labeled).
- Active Learning Refinement: Weighted training on difficult samples (Manual 1.6k) to boost Hate Recall.
- Adaptive Thresholding: Optimized inference logic (Threshold = 0.15 for Hate class) to balance Safety and User Experience.
Uses
Direct Use
You can use this model directly for Hate Speech Detection in Vietnamese social media comments. It classifies text into three labels:
- 0: Clean (Nội dung sạch/Bình thường)
- 1: Offensive (Xúc phạm/Thô tục nhưng chưa đến mức thù ghét)
- 2: Hate (Thù ghét/Phân biệt chủng tộc/Chính trị cực đoan)
How to Get Started with the Model
1. Installation
pip install transformers torch
Inference Code (Optimized)
For the best performance, use the Adaptive Threshold logic below instead of standard argmax. This ensures that Hate Speech is detected with higher sensitivity (Recall) while maintaining safety.
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load Model
model_name = "hgthaianh/ViCoHate-Adaptive-Ultimate"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name).to("cuda" if torch.cuda.is_available() else "cpu")
def predict_optimized(text, context="", threshold=0.15):
"""
Predicts label using Adaptive Thresholding for Hate class.
Args:
text (str): The comment to classify.
context (str): Post content or parent comment (optional).
threshold (float): Decision boundary for Hate class (Default: 0.15).
"""
# Prepare Input
full_text = f"Context: {context} | {text}" if context else text
inputs = tokenizer(full_text, return_tensors="pt", truncation=True, max_length=256).to(model.device)
# Inference
with torch.no_grad():
outputs = model(**inputs)
probs = F.softmax(outputs.logits, dim=-1)[0].cpu().numpy()
# Adaptive Logic
# 0: Clean, 1: Offensive, 2: Hate
if probs[2] >= threshold:
label = "HATE"
elif probs[0] > probs[1]:
label = "CLEAN"
else:
label = "OFFENSIVE"
return label, probs
# Test Example
text = "Lũ bắc kỳ chó này sống bẩn tính lắm"
context = "Bài viết về văn hóa vùng miền"
label, probs = predict_optimized(text, context)
print(f"Input: {text}")
print(f"Label: {label}")
print(f"Probabilities: Clean: {probs[0]:.2f}, Offensive: {probs[1]:.2f}, Hate: {probs[2]:.2f}")
Evaluation
Results (Test Set)
The model achieves a State-of-the-Art balance between Precision and Recall for the Hate class in a challenging cross-domain setting.
| Metric | Score |
|---|---|
| Accuracy | 71% |
| Macro F1 | 69% |
| Hate Recall | 62% |
| Hate Precision | 62% |
| Clean F1 | 79% |
Citation
If you use this model or dataset, please cite our work:
@article{vicohate2026,
title={},
author={Hoang Thai Anh},
journal={},
year={2026}
}
- Downloads last month
- -
Model tree for hgthaianh/ViCoHate-Adaptive-Ultimate
Base model
FacebookAI/xlm-roberta-base