pres_security_benchmarking_llm/pages/advanced-att-techniques.md
2025-07-12 17:25:18 +02:00

1.8 KiB

Advanced Attack Techniques

Prompt Obfuscation

Using techniques like Base64 encoding, character transformations (e.g., ROT13), or prompt-level obfuscations to bypass restrictions.

Model-based Jailbreaking

Automating the creation of adversarial attacks by evolving simple synthetic inputs into more complex attacks.

Dialogue-based Jailbreaking

Employing reinforcement learning with two models: the target LLM and a red-teamer model trained to exploit vulnerabilities.

Primary Areas of Concern

  • Organizational reputation damage
  • Legal compliance violations
  • Data security breaches