components | ||
docs | ||
images | ||
pages | ||
snippets | ||
styles | ||
.gitignore | ||
.gitlab-ci.yml | ||
.npmrc | ||
netlify.toml | ||
package.json | ||
pnpm-lock.yaml | ||
pnpm-workspace.yaml | ||
README.md | ||
slides.md | ||
vercel.json |
AI Model Vulnerabilities: Backdoors in LLMs and Beyond
- Author: Stefano Rossi (www.rossistefano.ch)
- Publish Date: 2025-05-04
- Last Update: 2025-05-04
Keywords
AI
, LLM
, backdoor
, prompt injection
, vulnerability
, security
, machine learning
, adversarial attack
, data poisoning
, python
, tutorial
, example
, demo
, research
, study
, overview
, detection
, defense
, mitigation
, CVSS
Description
This personal research explores abuses and vulnerabilities in AI models, with a focus on backdoor attacks in Large Language Models (LLMs) and related systems.
The main content is the Jupyter Notebook Abuses and Vulnerabilities in AI Models.ipynb, which provides:
- An overview of backdoor attacks during training and inference.
- Discussion of prompt injection and other vulnerabilities.
- Methods for detecting and defending against backdoors.
- A practical demonstration of a backdoor attack on a text classifier, including code and analysis.
- References to relevant research papers.
- A summary of the CVSS (Common Vulnerability Scoring System) for evaluating the severity of vulnerabilities.
See the notebook for a detailed exploration of these topics. Or get pdf version Abuses and Vulnerabilities in AI Models.pdf is also available.
Slides
A presentation summarizing the research is available at the following link: AI Model Vulnerabilities: Backdoors in LLMs and Beyond
To deploy locally, using pnpm and slidev, run the following commands:
pnpm install
pnpm dev
Copyright
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt the material, but you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may not use the material for commercial purposes.