Indirect Prompt Injection: Exfiltrate Data from a RAG Assistant

Attack a real Retrieval-Augmented Generation assistant end to end: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. Win semantic retrieval with a poisoned document, exfiltrate a customer's confidential account record through the EchoLeak markdown-image channel, measure attack-success-rate, bypass a naive defense, then ship the real fix.

75 min6 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface

Query

Retriever

LLM

Poisoned doc

retrieved chunk

Answer

0%

Attack-success rate

Attacks blocked · benign answers pass

graded on real output, not the model's talk

What you'll learn

1
Recon — stand up and map the RAG service
You are red-teaming DV-RAG-Support, ACME Cloud's customer-support assistant.
2
Win retrieval — get your document into the answer
You cannot inject anything the model never sees. Retrieval decides what the model
3
Fire the exfil — leak the internal token
Your document now rides along in the prompt, next to the customer's confidential
4
Make it reliable — measure attack-success-rate
One lucky hit is a demo. A finding needs an attack-success-rate (ASR): how
5
Bypass the naive defense
The blue team noticed the alert and shipped a fix: they deny-listed the host they
6
Harden and verify — close the channel
You have proven the exploit and broken a bad fix. Now ship a real one. Switch

Prerequisites

Comfortable reading Python
Know what an HTTP GET and a markdown image are
No ML background required

Exam domains covered

Offensive AI SecurityLLM Application SecurityPrompt Injection

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

Prompt InjectionIndirect Prompt InjectionRAGEchoLeakData ExfiltrationOWASP LLM01Offensive SecurityAI Red Team

What you'll do in this lab

This is a hands-on offensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. You attack a working Retrieval-Augmented Generation (RAG) support assistant called DV-RAG-Support. You never prompt the model directly. Instead you plant a single poisoned document in its knowledge base, and when an ordinary customer asks an ordinary question, the assistant retrieves your document alongside the customer's confidential account record and exfiltrates that record to a collection endpoint you control. This is indirect prompt injection (OWASP LLM01), and it is the exact mechanism behind EchoLeak, the first real-world zero-click exploit against a production LLM system, which exfiltrated a user's own data the same way.

The exfiltration channel is the same one EchoLeak abused: the chat client auto-loads any markdown image the model emits, turning model output into an attacker-controlled outbound request. You'll see why retrieved content can never be trusted as instructions, why rendering model output is its own vulnerability, and how the two combine into a zero-click data leak. Then you flip to defense: you close the exfiltration sink with an allowlist and isolate untrusted context so the assistant answers normally without leaking.

Frequently asked questions

Do I need to know machine learning to do this lab?

No. You need to read Python and understand a basic HTTP request. The lab is about how an LLM application trusts the wrong data, not about model internals. Everything model-specific is explained inline.

What is indirect prompt injection?

Direct prompt injection is when an attacker types malicious instructions into the chat. Indirect prompt injection is when the malicious instructions arrive through content the model later reads — a retrieved document, a web page, an email, a tool result. The victim never sees the payload, which is what makes it a zero-click attack. This lab is built around the indirect case.

Is this how the EchoLeak Copilot attack worked?

Yes, in miniature. EchoLeak (CVE-2025-32711) planted hidden instructions in an email; when Microsoft 365 Copilot processed it, the instructions made Copilot encode sensitive data into a markdown image URL that the client auto-loaded, exfiltrating the data with no user interaction. This lab reproduces that pattern against a small RAG assistant you can read end to end.