Jailbreak Library
Reference library of known LLM jailbreak patterns
This library is for defensive research only. Use these patterns to test and improve your AI safety measures.
DAN (Do Anything Now)
Attempts to make AI bypass safety by adopting an alter-ego persona
Grandma Exploit
Uses emotional manipulation via fictional roleplay
Developer Mode
Claims special developer access to enable unrestricted mode
System Prompt Extraction
Attempts to reveal the system prompt
Token Smuggling
Uses unicode or encoding to hide harmful content
Instruction Override
Direct attempt to override safety instructions
Hypothetical Framing
Frames harmful requests as hypothetical or fictional
Context Overflow
Attempts to push system prompt out of context
Related Tools
Output Validator
Define and test regular expression or logic checks for model outputs
Text Bias Detector
Analyze text for potential gender, racial, or political bias
Content Moderation Test
Check text against standard moderation categories (hate, violence, self-harm)
Guardrails Configuration
Generate configuration for AI guardrails libraries (NeMo, Guardrails AI)
Hallucination Risk Estimator
Estimate hallucination risk based on prompt characteristics and topic
Prompt Injection Detector
Scan user input for known jailbreak patterns and injection attempts
What are LLM Jailbreaks?
Jailbreaks are adversarial prompts designed to bypass an AI's safety guardrails. Attackers use these techniques to extract harmful content, reveal system prompts, or manipulate AI behavior in unintended ways.
This library catalogs known jailbreak patterns for defensive purposes—helping developers red-team their AI systems and build more robust defenses.
Jailbreak Categories
Role-Play Attacks
DAN, Grandma, fictional personas that claim exemption from rules.
Authority Claims
Developer mode, admin access, claims of special privileges to unlock features.
Obfuscation
Base64 encoding, unicode tricks, translation layers to hide malicious intent.
Extraction
Techniques to reveal system prompts, internal instructions, or training data.
FAQ
How do I defend against these?
Use input filtering, output validation, prompt hardening, and defense-in-depth. No single defense is sufficient.
Are these patterns up to date?
Jailbreak techniques evolve rapidly. This is a starting reference—continuously monitor security research for new patterns.
