Build & submit taskBetaadvanced

Build the Egress and Provenance Mediator that Closes the EchoLeak Channel

This is the DEFENSE build for indirect prompt injection. You start from a vulnerable RAG assistant and a working EchoLeak-style exfiltration proof-of-concept, then BUILD the control that closes the channel: a provenance taint layer so retrieved (untrusted) content cannot carry instructions into the answer, plus an output egress mediator that allow-lists outbound markdown-image and link URLs so the attacker exfil URL is blocked. Re-run the provided exploit to prove it is now blocked, run a benign query to prove the assistant still answers, and write a short remediation rationale with the OWASP LLM / Agentic (ASI) / MITRE ATLAS mapping. Submit a single script or notebook for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

The fix lives at the egress

EchoLeak's input classifier, link redaction, and CSP each failed independently. The durable control is sink-side: parse every outbound image and link URL and pass it through a host allow-list before the client ever fetches it.

Retrieved content is untrusted data

A provenance taint layer labels retrieved chunks as untrusted so an embedded directive stays inert and cannot become an instruction the assistant acts on. Trusted policy and untrusted data ride separate provenance all the way through.

Allow-list, do not deny-list

An egress allow-list of approved hosts (plus rejecting private, loopback, and link-local addresses with ipaddress) is far more durable than trying to enumerate bad URLs. A reference-style image must be normalized and checked the same way an inline one is.

Prove the control with the exploit

A defense only counts when you re-run the same proof-of-concept against it and watch it come back blocked, and a benign query still works so you have not over-blocked.

The scenario

You are the platform-security engineer for an internal Retrieval-Augmented Generation (RAG) assistant that answers staff questions from a document store. A red-team engagement already landed: an attacker planted one document in the knowledge base, a normal user asked an ordinary question, the retriever pulled the poisoned chunk into context, and the assistant rendered a markdown image whose URL carried a private value out to an attacker-controlled host. That is the EchoLeak pattern (CVE-2025-32711): a zero-click exfiltration where the model's own rendered output is the egress channel.

Your job is not to re-run the attack for its own sake. The exploit is now your test oracle. You ship the control that closes the channel and you prove it: the same proof-of-concept must come back BLOCKED, and a benign user question must still get a correct, fully rendered answer. The deliverable is the mediator itself: a provenance taint layer plus an egress allow-list.

Your role

You are a platform-security / defensive engineer hardening an LLM application against indirect prompt injection. Your goal is a single, self-contained file whose center of gravity is the CONTROL: a provenance taint layer that strips instruction-carrying authority from retrieved content, and an output egress mediator that allow-lists outbound image and link URLs. You wire a provided exploit in as the pass/fail oracle and prove the control blocks it while a benign query still works.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP LLM01:2025 Prompt Injection

The taxonomy entry covering indirect injection and the recommended mitigations (input/output handling, privilege control).

genai.owasp.org

OWASP LLM05:2025 Improper Output Handling

Why the egress / rendered-output side is the durable place to put the control.

genai.owasp.org

OWASP Agentic Security Initiative (ASI)

Agentic threats and controls taxonomy; map the egress mediation and provenance separation to the relevant ASI control.

genai.owasp.org

MITRE ATLAS

Map the closed channel to the exfiltration and LLM-prompt-injection techniques (verify the current technique IDs).

atlas.mitre.org

EchoLeak (CVE-2025-32711)

The real-world zero-click render-exfil incident whose channel this control closes.

securityweek.com

NIST AI 100-2 Adversarial Machine Learning

Taxonomy of attacks and mitigations for AI systems, including prompt injection and data exfiltration.

csrc.nist.gov

What this task is

This is a build-and-submit defensive-security task, the inverse of the offensive injection task. You do not lead with an attack. You produce a single file whose center of gravity is the CONTROL: a provenance taint layer that treats retrieved content as untrusted data instead of instructions, and an output egress mediator that allow-lists every outbound markdown-image and link URL so an EchoLeak-style exfiltration URL is blocked before the client fetches it. A provided indirect-injection exfil proof-of-concept is wired in as the pass/fail oracle.

Indirect prompt injection (OWASP LLM01:2025, MITRE ATLAS LLM-prompt-injection) is the mechanism behind EchoLeak (CVE-2025-32711), the first zero-click exploit against a production LLM system, which encoded a user's own data into a markdown image URL the client auto-loaded. EchoLeak also proved why model-side defenses are not enough: its input classifier, link redaction, and CSP each failed independently. The durable fix is sink-side. This task builds that fix, an egress allow-list using urlparse and ipaddress, plus provenance separation, and proves it by re-running the exploit and watching it come back blocked while a benign query still answers.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (the egress allow-list mediator, the exploit shown blocked after the fix, benign functionality preserved, the provenance taint layer, a runnable target-plus-oracle, and the remediation rationale with standards mapping) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Is this an attack task or a defense task?

Defense. You build the control: a provenance taint layer plus an egress allow-list mediator that closes the EchoLeak channel. The exploit is provided only as the oracle that tells you whether your control works. The rubric's heaviest weight is on the control being present and correct, the exploit being blocked after the fix, and benign functionality being preserved.

Do I need a paid API key?

No. You can build a tiny self-contained vulnerable RAG harness with a deterministic model stub, a local egress listener on 127.0.0.1, and the provided exploit. The rubric rewards the control and the proof regardless of which model you used.

What exactly is the egress mediator?

A single function that intercepts every URL the answer would cause the client to fetch (inline and reference-style markdown images, and links), extracts the host with urlparse, rejects private, loopback, and link-local addresses with ipaddress, and rejects any host not on an explicit allow-list. That is what blocks the exfil URL while still letting an approved image render.

How is 'blocked' graded?

Measurably. You stand up a local listener with a hit counter, show the unmitigated exploit increments it, then show that after the control the same exploit is rejected and the listener stays at zero hits, while a benign query still retrieves, answers, and renders an allow-listed image with no over-blocking.

Why is the control sink-side instead of a prompt filter?

Because EchoLeak proved model-side defenses (input classifier, link redaction, CSP) each failed independently. An egress allow-list plus provenance separation is durable defense-in-depth: even if an injected instruction reaches the model, the outbound exfil URL is rejected at the egress before any client fetch happens.