DeepGuard AI vs Real Deepfake Model
Model Overview
This is a fine-tuned version of google/siglip2-base-patch16-224, specifically trained for binary image classification to detect AI-generated and deepfake images. It is the core inference engine powering the DeepGuard AI Media Forensics App.
The model distinguishes between Real photographs and Fake (AI-generated or deepfake) images. By leveraging the powerful SigLIP2 vision-language encoder and training it on a diverse, multi-source dataset of over 330,000 images, this model demonstrates robust performance in identifying synthetic media, including outputs from modern generators like Midjourney, Stable Diffusion, and DALL·E.
| Metric | Value |
|---|---|
| Architecture | SigLIP2 (Vision Transformer) |
| Base Model | google/siglip2-base-patch16-224 |
| Input Resolution | 224x224 pixels |
| Number of Classes | 2 (Real, Fake) |
| Model Size | ~372 MB |
| License | Apache 2.0 |
Datasets
The model was trained on a carefully curated, balanced dataset of 40,000 images (20,000 real, 20,000 fake), sampled from five diverse, high-quality sources to ensure robustness and generalization across various forgery types.
| Dataset Name | Source | Description |
|---|---|---|
| Deepfake and Real Images | manjilkarki/deepfake-and-real-images |
A foundational dataset of 190k human faces, split evenly between real and manipulated images created by various deepfake techniques. Images are 256x256 pixels[reference:0]. |
| HardFake vs Real Faces | hamzaboulahia/hardfakevsrealfaces |
A challenging test-oriented dataset of 1,288 high-quality images (700 fake, 589 real) designed to push the limits of detection models. Fake faces are generated using StyleGAN2, and real faces feature diverse attributes[reference:1]. |
| GRAVEX-200K | muhammadbilal6305/200k-real-vs-ai-visuals-by-mbilal |
A comprehensive multisource dataset of 200,000 face images, curated from six major sources including FaceForensics++, DFDC, Celeb-DF, and Stable Diffusion outputs (SD 1.5, 2.1, XL)[reference:2]. |
| DeepDetect-2025 | ayushmandatta1/deepdetect-2025 |
A large-scale dataset of over 112,000 images spanning diverse categories (people, animals, nature, urban, artworks), generated by cutting-edge models like DALL·E 3, Midjourney, and Stable Diffusion 3. |
| Super GenAI (SUT-Project) | hiddenplant/sut-project |
A dataset featuring high-fidelity images from the latest generative models, including Midjourney V6, Flux, and NanoBanana (SDXL), covering landscapes, portraits, and urban scenes. |
Training Procedure
The model was fine-tuned using a progressive unfreezing strategy to adapt the pre-trained SigLIP2 encoder while preventing catastrophic forgetting. All training was performed on a Tesla T4 GPU in Google Colab.
Training Hyperparameters
| Stage | Epochs | Learning Rate | Trainable Parameters | Description |
|---|---|---|---|---|
| Stage 1 | 2 | 1e-3 | Classifier head only | Warm-up phase to adapt the new binary classification head. |
| Stage 2 | 3 | 5e-5 | Classifier + Top 6 Transformer Blocks | Gradual unfreezing to allow the model to learn task-specific features. |
| Stage 3 | 2 | 1e-5 | All layers | Full model fine-tuning with a very low learning rate for final convergence. |
- Batch Size: 32
- Optimizer: AdamW
- Scheduler: Cosine Annealing
- Loss Function: Cross-Entropy Loss
- Data Augmentation: Random Horizontal Flip, Random Rotation (10°), Color Jitter
Performance Metrics
Evaluation on a held-out validation set results:
| Metric | Score |
|---|---|
| Accuracy | 78.5% |
| AUC | > 0.86 |
| F1 Score | ~0.78 |
Usage
You can load and use this model directly with the Hugging Face transformers library.
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "king1oo1/ai-vs-real-deepfake-model" # Replace with your actual model ID
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)
model.eval()
# Load and preprocess an image
image = Image.open("path/to/your/image.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
fake_prob = probs[0][1].item() * 100
real_prob = probs[0][0].item() * 100
print(f"Fake probability: {fake_prob:.2f}%")
print(f"Real probability: {real_prob:.2f}%")
print(f"Verdict: {'FAKE' if fake_prob > 50 else 'REAL'}")
- Downloads last month
- 78
Model tree for king1oo1/deepfake-model
Base model
google/siglip2-base-patch16-224