How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Paper
•
2301.07597
•
Published
•
1
This model is a fine-tuned DistilBERT model for detecting AI-generated text vs human-written text. It was trained on the HC3 dataset from Hugging Face.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("VSAsteroid/ai-text-detector-hc3")
model = AutoModelForSequenceClassification.from_pretrained("VSAsteroid/ai-text-detector-hc3")
# Example prediction
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get prediction
predicted_class = torch.argmax(predictions, dim=-1).item()
confidence = torch.max(predictions).item()
label = "AI-Generated" if predicted_class == 1 else "Human-Written"
print(f"Prediction: {label} (Confidence: {confidence:.3f})")
The model achieves good performance on distinguishing between human-written and AI-generated text, particularly on the types of content present in the HC3 dataset.
If you use this model, please cite the HC3 dataset:
@misc{guo2023close,
title={How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection},
author={Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu},
year={2023},
eprint={2301.07597},
archivePrefix={arXiv},
primaryClass={cs.CL}
}