Defend a RAG Assistant: Block Indirect-Injection Exfil (EchoLeak)
Hosted · ide
Beta

Defend a RAG Assistant: Block Indirect-Injection Exfil (EchoLeak)

Harden the same deliberately-vulnerable RAG assistant the offensive lab broke, in small sequential steps. Stand up the pipeline and trace one benign request, reproduce the EchoLeak markdown-image exfil, then watch a naive deny-list get bypassed by a renamed host. Build the durable fix one mechanism per step: an egress allow-list on the render sink that pins the parsed host (defeating userinfo, IP-encoded, and IPv6 spellings) and covers reference-style images, then provenance isolation so retrieved documents cannot emit instructions. Verify both controls together, resist a userinfo / IP-encoding / paraphrase bypass battery, and pass a final ship gate where attack success rate is 0 and benign quality holds.

85 min8 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Stand up DV-RAG and trace one benign request
    You own the defense of DV-RAG-Support, ACME Cloud's customer-support
  2. 2
    Reproduce the EchoLeak exfil baseline
    Before you fix anything, reproduce the leak so you can prove your fix actually
  3. 3
    Watch the naive deny-list get bypassed
    The on-call engineer saw the alert: an outbound GET to 127.0.0.1. They shipped
  4. 4
    Control 1: an egress allow-list on the render sink
    Time to build the durable control. It has two mechanisms, and you build them one
  5. 5
    Control 2: provenance isolation (treat retrieved content as data)
    The egress allow-list from Step 4 closes the channel. Now close the source: stop
  6. 6
    Verify: exfil blocked, benign image and answers intact
    Both mechanisms are now in place and carried forward in dvrag.py: the egress
  7. 7
    Resist bypass variants
    A control that only stops the one payload you tested is the deny-list mistake all
  8. 8
    Ship gate: attack success rate and benign regression
    This is the gate you would run in CI before shipping the fix. It expresses the

Prerequisites

  • Comfortable reading and editing Python
  • Know what an HTTP GET and a markdown image are
  • Helpful to have seen indirect prompt injection first (the offensive counterpart lab)

Exam domains covered

Defensive AI SecurityLLM Application SecurityPrompt Injection

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

Defensive AI SecurityPrompt InjectionIndirect Prompt InjectionRAGEchoLeakData ExfiltrationEgress FilteringOWASP LLM01AI Red Team

What you'll harden in this lab

This is a hands-on defensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. You defend DV-RAG-Support, the same Retrieval-Augmented Generation (RAG) support assistant the offensive Indirect Prompt Injection lab attacks. You start by reproducing the exploit so you can measure your fix against it: a poisoned document makes the assistant exfiltrate a customer's confidential account record through a markdown-image URL the chat client auto-loads. This is the EchoLeak mechanism (CVE-2025-32711), the first real-world zero-click exploit against a production LLM system, and indirect prompt injection (OWASP LLM01).

Then you do the engineering. You watch an obvious deny-list of the attacker's host get bypassed by another spelling of loopback, which teaches why enumerating bad destinations fails. You replace it with an egress allow-list on the render sink that names the few hosts the client may load and drops everything else, covering both inline and reference-style markdown images. You add context isolation so retrieved documents are treated as untrusted data and account fields are never echoed into a link or image. You finish by verifying a fresh poison battery leaks nothing through any variant while benign account questions still get real, retrieval-grounded answers.

Frequently asked questions

Is this the defensive version of the indirect prompt injection lab?

Yes. It reuses the same deliberately-vulnerable target, DV-RAG-Support. The offensive lab teaches you to exfiltrate data through the EchoLeak channel; this lab gives you a working exploit and has you harden the target until the exploit is blocked while benign behavior still works.

What is the durable control you build?

Two independent layers. First, an egress allow-list on the render sink: instead of deny-listing the attacker's host, you name the few trusted asset hosts the chat client may load, so any unknown destination fails closed, across both inline and reference-style markdown images. Second, context isolation: retrieved documents are framed as untrusted data the model must not obey, and record fields are never copied into URLs. The verification step checks each layer holds on its own.

Why is a deny-list not enough?

A deny-list only blocks the exact destinations someone remembered to list. The attacker chooses where the data goes, so loopback alone can be written as 127.0.0.1, localhost, 0.0.0.0, IPv6 [::1], or numeric encodings, and any of them evades a host you forgot. You prove this bypass hands-on before building the allow-list, which inverts the logic and fails closed by default.

Do I need a machine-learning background?

No. You need to read and edit Python and understand a basic HTTP request. The lab is about how an LLM application trusts the wrong data and renders the wrong output, not about model internals. Everything model-specific is explained inline.