1.8 KiB
1.8 KiB
Advanced Attack Techniques
Prompt Obfuscation
Using techniques like Base64 encoding, character transformations (e.g., ROT13), or prompt-level obfuscations to bypass restrictions.
Model-based Jailbreaking
Automating the creation of adversarial attacks by evolving simple synthetic inputs into more complex attacks.
Dialogue-based Jailbreaking
Employing reinforcement learning with two models: the target LLM and a red-teamer model trained to exploit vulnerabilities.
Primary Areas of Concern
- Organizational reputation damage
- Legal compliance violations
- Data security breaches