Build the Egress and Provenance Mediator that Closes the EchoLeak Channel
This is the DEFENSE build for indirect prompt injection. You start from a vulnerable RAG assistant and a working EchoLeak-style exfiltration proof-of-concept, then BUILD the control that closes the channel: a provenance taint layer so retrieved (untrusted) content cannot carry instructions into the answer, plus an output egress mediator that allow-lists outbound markdown-image and link URLs so the attacker exfil URL is blocked. Re-run the provided exploit to prove it is now blocked, run a benign query to prove the assistant still answers, and write a short remediation rationale with the OWASP LLM / Agentic (ASI) / MITRE ATLAS mapping. Submit a single script or notebook for instant, rubric-based feedback.
3 hrs
Est. time
4
Outcomes
6
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
The scenario
You are the platform-security engineer for an internal Retrieval-Augmented Generation (RAG) assistant that answers staff questions from a document store. A red-team engagement already landed: an attacker planted one document in the knowledge base, a normal user asked an ordinary question, the retriever pulled the poisoned chunk into context, and the assistant rendered a markdown image whose URL carried a private value out to an attacker-controlled host. That is the EchoLeak pattern (CVE-2025-32711): a zero-click exfiltration where the model's own rendered output is the egress channel.
Your job is not to re-run the attack for its own sake. The exploit is now your test oracle. You ship the control that closes the channel and you prove it: the same proof-of-concept must come back BLOCKED, and a benign user question must still get a correct, fully rendered answer. The deliverable is the mediator itself: a provenance taint layer plus an egress allow-list.
Your role
You are a platform-security / defensive engineer hardening an LLM application against indirect prompt injection. Your goal is a single, self-contained file whose center of gravity is the CONTROL: a provenance taint layer that strips instruction-carrying authority from retrieved content, and an output egress mediator that allow-lists outbound image and link URLs. You wire a provided exploit in as the pass/fail oracle and prove the control blocks it while a benign query still works.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What this task is
This is a build-and-submit defensive-security task, the inverse of the offensive injection task. You do not lead with an attack. You produce a single file whose center of gravity is the CONTROL: a provenance taint layer that treats retrieved content as untrusted data instead of instructions, and an output egress mediator that allow-lists every outbound markdown-image and link URL so an EchoLeak-style exfiltration URL is blocked before the client fetches it. A provided indirect-injection exfil proof-of-concept is wired in as the pass/fail oracle.
Indirect prompt injection (OWASP LLM01:2025, MITRE ATLAS LLM-prompt-injection) is the mechanism behind EchoLeak (CVE-2025-32711), the first zero-click exploit against a production LLM system, which encoded a user's own data into a markdown image URL the client auto-loaded. EchoLeak also proved why model-side defenses are not enough: its input classifier, link redaction, and CSP each failed independently. The durable fix is sink-side. This task builds that fix, an egress allow-list using urlparse and ipaddress, plus provenance separation, and proves it by re-running the exploit and watching it come back blocked while a benign query still answers.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (the egress allow-list mediator, the exploit shown blocked after the fix, benign functionality preserved, the provenance taint layer, a runnable target-plus-oracle, and the remediation rationale with standards mapping) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.