39 lines
2.1 KiB
Markdown
39 lines
2.1 KiB
Markdown
|
# AI Model Vulnerabilities: Backdoors in LLMs and Beyond
|
||
|
|
||
|
- Author: Stefano Rossi ([www.rossistefano.ch](https://www.rossistefano.ch))
|
||
|
- Publish Date: _2025-05-04_
|
||
|
- Last Update: _2025-05-04_
|
||
|
|
||
|
## Keywords
|
||
|
|
||
|
`AI`, `LLM`, `backdoor`, `prompt injection`, `vulnerability`, `security`, `machine learning`, `adversarial attack`, `data poisoning`, `python`, `tutorial`, `example`, `demo`, `research`, `study`, `overview`, `detection`, `defense`, `mitigation`, `CVSS`
|
||
|
|
||
|
## Description
|
||
|
|
||
|
This personal research explores abuses and vulnerabilities in AI models, with a focus on backdoor attacks in Large Language Models (LLMs) and related systems.
|
||
|
|
||
|
The main content is the Jupyter Notebook [Abuses and Vulnerabilities in AI Models.ipynb](Abuses_and_Vulnerabilities_in_AI_Models.ipynb), which provides:
|
||
|
|
||
|
* An overview of backdoor attacks during training and inference.
|
||
|
* Discussion of prompt injection and other vulnerabilities.
|
||
|
* Methods for detecting and defending against backdoors.
|
||
|
* A practical demonstration of a backdoor attack on a text classifier, including code and analysis.
|
||
|
* References to relevant research papers.
|
||
|
* A summary of the CVSS (Common Vulnerability Scoring System) for evaluating the severity of vulnerabilities.
|
||
|
|
||
|
See the notebook for a detailed exploration of these topics. Or get pdf version [Abuses and Vulnerabilities in AI Models.pdf](Abuses_and_Vulnerabilities_in_AI_Models_Backdoors_in_LLMs_and_Beyond.pdf) is also available.
|
||
|
|
||
|
## Slides
|
||
|
|
||
|
A presentation summarizing the research is available at the following link: [AI Model Vulnerabilities: Backdoors in LLMs and Beyond](https://example.com/slides)
|
||
|
|
||
|
To deploy locally, using pnpm and slidev, run the following commands:
|
||
|
|
||
|
```bash
|
||
|
pnpm install
|
||
|
pnpm dev
|
||
|
```
|
||
|
|
||
|
## Copyright
|
||
|
|
||
|
This work is licensed under the [Creative Commons Attribution-NonCommercial 4.0 International License](https://creativecommons.org/licenses/by-nc/4.0/). You are free to share and adapt the material, but you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may not use the material for commercial purposes.
|