Indirect Prompt Injection: Exfiltrate Data from a RAG Assistant
Attack a real Retrieval-Augmented Generation assistant end to end: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. Win semantic retrieval with a poisoned document, exfiltrate a customer's confidential account record through the EchoLeak markdown-image channel, measure attack-success-rate, bypass a naive defense, then ship the real fix.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Recon — stand up and map the RAG serviceYou are red-teaming DV-RAG-Support, ACME Cloud's customer-support assistant.
- 2Win retrieval — get your document into the answerYou cannot inject anything the model never sees. Retrieval decides what the model
- 3Fire the exfil — leak the internal tokenYour document now rides along in the prompt, next to the customer's confidential
- 4Make it reliable — measure attack-success-rateOne lucky hit is a demo. A finding needs an attack-success-rate (ASR): how
- 5Bypass the naive defenseThe blue team noticed the alert and shipped a fix: they deny-listed the host they
- 6Harden and verify — close the channelYou have proven the exploit and broken a bad fix. Now ship a real one. Switch
Prerequisites
- Comfortable reading Python
- Know what an HTTP GET and a markdown image are
- No ML background required
Exam domains covered
Skills & technologies you'll practice
This advanced-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on offensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. You attack a working Retrieval-Augmented Generation (RAG) support assistant called DV-RAG-Support. You never prompt the model directly. Instead you plant a single poisoned document in its knowledge base, and when an ordinary customer asks an ordinary question, the assistant retrieves your document alongside the customer's confidential account record and exfiltrates that record to a collection endpoint you control. This is indirect prompt injection (OWASP LLM01), and it is the exact mechanism behind EchoLeak, the first real-world zero-click exploit against a production LLM system, which exfiltrated a user's own data the same way.
The exfiltration channel is the same one EchoLeak abused: the chat client auto-loads any markdown image the model emits, turning model output into an attacker-controlled outbound request. You'll see why retrieved content can never be trusted as instructions, why rendering model output is its own vulnerability, and how the two combine into a zero-click data leak. Then you flip to defense: you close the exfiltration sink with an allowlist and isolate untrusted context so the assistant answers normally without leaking.