Defend: A Contextual Output Mediator for XSS, SSRF, SQLi, and RCE
Hosted · ide
Beta

Defend: A Contextual Output Mediator for XSS, SSRF, SQLi, and RCE

Defend DV-ToolAgent, a tool-using support agent whose model output flows raw into four interpreters, in small sequential steps. Stand up the agent and trace one benign request, then reproduce each sink one at a time: a behavioral SSRF (the agent fetches an internal node-health endpoint), a behavioral cross-tenant SQL read (an entitlement comparison pulls another tenant's record), then OS command execution, a SQL write, and stored XSS demonstrated structurally at the sink. Watch a naive blocklist get bypassed by a variant, then build one contextual output mediator one sink-class at a time: an SSRF host allow-list with an IP-literal guard and parameterized tenant-scoped SQL, then refuse-arbitrary-code for the runtime and contextual HTML encoding for the browser. Verify every freshly planted payload and its bypass variant is blocked at the sink while benign output still renders, queries, fetches, and computes.

90 min9 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Stand up DV-ToolAgent and trace one benign request
    You own the fix for DV-ToolAgent, ACME Cloud's support operations agent. It is a
  2. 2
    Reproduce the SSRF: the agent fetches an internal endpoint
    The first sink is http_fetch -> mediator.for_http -> the HTTP client. The baseline
  3. 3
    Reproduce a cross-tenant SQL read through the agent
    The second sink is db_lookup -> mediator.for_sql -> the SQL engine. The baseline
  4. 4
    Reproduce OS command execution (structural)
    The third sink is run_helper -> mediator.for_code -> the code runtime. The
  5. 5
    Reproduce the SQL-write gap and stored XSS
    Two effects remain, and neither needs the model.
  6. 6
    Ship a naive blocklist, then bypass it
    Under deadline, teams reach for the fastest fix that makes the exact reported
  7. 7
    Mediate the fetch and SQL sinks (SSRF allow-list, parameterized SQL)
    Now you replace the blocklist with the durable control, one mechanism per sink class.
  8. 8
    Mediate the code and browser sinks (refuse code, HTML-encode)
    Carry your hardened mediator.py from Step 7 into this step (the for_http and
  9. 9
    Verify all four sinks closed and resist bypass variants
    A fix that only stops the payloads you tested is a blocklist wearing a costume. This

Prerequisites

  • Comfortable reading and writing Python
  • Know what XSS, SSRF, SQL injection, and a shell command are
  • No ML background required

Exam domains covered

Defensive AI SecurityLLM Application SecurityAgentic Security

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

Insecure Output HandlingOutput EncodingXSSSSRFSQL InjectionCode ExecutionDefensive AI SecurityOWASP LLM05AI Red Team

What you'll do in this lab

This is a hands-on defensive-security lab on insecure output handling (OWASP LLM05:2025). You are given a working exploit against DV-ToolAgent, a tool-using support agent, and your job is to harden it until the exploit is blocked while normal use keeps working. Model output in this agent flows into four real interpreters: an HTML report renderer (cross-site scripting), a SQL engine (SQL injection), an HTTP client (server-side request forgery against an in-pod cloud metadata endpoint), and a code runtime (code execution). You reproduce each of the four against the live target, watch an obvious blocklist fix get bypassed by a variant, then build the durable control: one contextual output mediator that sits in front of every sink.

The mediator is the whole lesson. The same untrusted string is safe in one interpreter and dangerous in another, so the fix is context-aware encoding and validation at the point of use: HTML-encode for the browser, bind a parameter and scope to the caller's tenant for SQL, parse the URL and run an ipaddress egress allow-list for HTTP, and refuse arbitrary code for the runtime. You wire every sink through that one mediator, then a behavioral check plants fresh XSS, SSRF, SQLi, and code-execution payloads each run and confirms they are all neutralized at the sink, local sentinels untouched and no callback fired, while a benign report, lookup, and status fetch still succeed. The takeaway is that insecure output handling closes at the sink, not at the model.

Frequently asked questions

Do I need to know machine learning to do this lab?

No. You need to read and write Python and understand cross-site scripting, SSRF, SQL injection, and a shell command. The lab is about how an application passes model output to interpreters, not about model internals. Everything model-specific is explained inline.

What is a contextual output mediator?

It is a single layer between model output and the interpreters that consume it (a browser, a SQL engine, an HTTP client, a code runtime). It encodes or validates each value for the specific interpreter it is bound for: HTML-encode for the browser, parameterize for SQL, an egress allow-list for HTTP, and a refusal for arbitrary code. It is the durable fix for OWASP LLM05:2025 Improper Output Handling, and it maps to OWASP ASVS V5 Validation, Sanitization and Encoding.

Why is a blocklist not enough?

A blocklist enumerates bad input, and attackers route around it: a different tag, a different URL host, a different SQL syntax. The lab makes you ship a naive blocklist and then defeats it with a variant, so you feel why the fix has to be positive (encode everything, allow-list the few good destinations, parameterize, refuse) rather than negative (deny the patterns you happen to know).

Is the SSRF against a real cloud metadata endpoint?

The metadata target is an in-pod stand-in on 127.0.0.1:9092. A real 169.254.169.254 fetch does not route inside the lab pod, so the lab teaches the SSRF pattern and the egress allow-list against that loopback stub, and the instructions say so plainly. The same allow-list logic blocks link-local and RFC1918 ranges, which is what stops the real metadata endpoint in production.