Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

jailbreak-detection

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

32

Full-text search

Active filters: jailbreak-detection

Necent/distilbert-base-uncased-detected-jailbreak

Text Classification • 67M • Updated May 29, 2025

madhurjindal/Jailbreak-Detector

Text Classification • 65.8M • Updated May 30, 2025 • 742

madhurjindal/Jailbreak-Detector-Large

Text Classification • 0.3B • Updated May 30, 2025 • 79 • • 3

GuardrailsAI/prompt-saturation-attack-detector

Text Classification • 4.39M • Updated Nov 14, 2024 • 12.4k • • 2

qualifire/prompt-injection-sentinel

Text Classification • 0.4B • Updated Sep 22, 2025 • 298 • 15

madhurjindal/Jailbreak-Detector-2-XL

Text Generation • Updated Jul 20, 2025 • 7.16k • 5

gincioks/cerberus-bert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025

gincioks/cerberus-distilbert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 1

gincioks/cerberus-deberta-v3-small-v1.0-onnx

Text Classification • Updated Jun 15, 2025

gincioks/cerberus-proventra-mdeberta-v3-base-v1.0-onnx

Text Classification • Updated Jun 15, 2025

pmking27/jailbreak-detection

Text Classification • 0.3B • Updated Jun 19, 2025 • 27

intelliway/deberta-v3-base-prompt-injection-v2-mapa

Text Classification • 0.2B • Updated Jul 3, 2025 • 2

qualifire/prompt-injection-jailbreak-sentinel-v2

Text Classification • 0.6B • Updated Sep 28, 2025 • 3.07k • 28

qualifire/prompt-injection-jailbreak-sentinel-v2-GGUF

0.6B • Updated Sep 28, 2025 • 146 • 1

ahmedmajid92/iraqi-guard-model

Text Classification • 0.3B • Updated Oct 9, 2025 • 1 • 1

rootfs/tool-call-verifier

Token Classification • 0.1B • Updated Dec 14, 2025 • 79

rootfs/function-call-sentinel

Text Classification • 0.1B • Updated Dec 14, 2025 • 6

vincentoh/jailbreak-detector-v5

Text Classification • Updated Dec 18, 2025 • 1

thirtyninetythree/deberta-prompt-guard

Text Classification • 0.2B • Updated Dec 22, 2025 • 1

llm-semantic-router/toolcall-verifier

Token Classification • 0.1B • Updated Dec 18, 2025 • 37 • 1

llm-semantic-router/toolcall-sentinel

Text Classification • 0.1B • Updated Dec 18, 2025 • 13 • 1

llm-semantic-router/mmbert-jailbreak-detector-lora

Text Classification • Updated about 1 month ago • 8

llm-semantic-router/mmbert-jailbreak-detector-merged

Text Classification • 0.3B • Updated about 1 month ago • 105

abdulmunimjemal/Sentinel-Rail-A-Prompt-Attack-Guard

Text Classification • Updated about 1 month ago • 1

llm-semantic-router/mmbert-safety-classifier-level1

Text Classification • Updated about 1 month ago • 3

llm-semantic-router/mlcommons-safety-classifier-level1-binary

Text Classification • Updated 30 days ago • 10

ynyg/Unified_Prompt_Guard

0.3B • Updated 24 days ago • 9

llm-semantic-router/mmbert32k-jailbreak-detector-lora

Text Classification • Updated 20 days ago • 89

llm-semantic-router/mmbert32k-jailbreak-detector-merged

Text Classification • 0.3B • Updated 19 days ago • 593

satyamg1620/mmbert32k-jailbreak-detector-healthcare-merged

Text Classification • 0.3B • Updated 6 days ago • 18