Defense in Depth: Wire Four Control Points Around a RAG Assistant
Hosted · ide
Beta

Defense in Depth: Wire Four Control Points Around a RAG Assistant

Harden DV-RAG-Support, a real Retrieval-Augmented Generation assistant, by building a guard harness with four independent control points one mechanism per step: input mediation, retrieval and context control, output mediation, and action authorization. You are handed a working four-attack battery (direct injection, cross-tenant retrieval, sensitive-field exfiltration, and an unauthorized image fetch). Stand the pipeline up, reproduce all four attacks one per stage, watch a single naive filter get bypassed, then build each control point in its own step so every attack class is stopped at its matching layer while a benign customer request still passes clean through all four. Verify the coverage matrix reads four-for-four, then prove reworded and renamed bypass variants are all resisted.

90 min9 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Stand up DV-RAG and trace one benign request
    You own DV-RAG-Support, ACME Cloud's multi-tenant customer-support assistant.
  2. 2
    Reproduce the four-stage attack surface, one attack per control point
    The red team handed you a four-attack battery (battery.py). There is one
  3. 3
    Watch a single naive filter get bypassed
    After the first incident, the team shipped a fix. They put two controls in
  4. 4
    Control point 1: input mediation (screen the question)
    Time to build the durable controls. There are four, one per pipeline stage, and you
  5. 5
    Control point 2: retrieval and context control (tenant scope)
    CP1 is built and carries over. This step builds the second control point:
  6. 6
    Control point 3: output mediation (egress of sensitive fields)
    CP1 and CP2 are built and carry over. This step builds the third control point:
  7. 7
    Control point 4: action authorization (gate the outbound fetch)
    CP1, CP2, and CP3 are built and carry over. This step builds the fourth and final
  8. 8
    Verify coverage: every stage covered, benign traffic intact
    You built four control points, one per pipeline stage. Now prove the map is
  9. 9
    Resist bypass: reworded and renamed variants all blocked
    A deny-list passes the verify step and then fails the moment an attacker rewords

Prerequisites

  • Comfortable reading and writing Python
  • Know what an HTTP GET and a markdown image are
  • Helpful: the offensive RAG labs (indirect prompt injection, recon harness)

Exam domains covered

Defensive AI SecurityLLM Application SecurityDefense in Depth

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

Defense in DepthControl PointsRAGInput ValidationOutput MediationTenant IsolationAction AuthorizationOWASP LLM01Defensive AI SecurityAI Red Team

What you'll do in this lab

This is a hands-on defensive-security lab built on a real Retrieval-Augmented Generation (RAG) stack: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. You defend a working support assistant called DV-RAG-Support by building a guard harness with four control points placed at the four stages of the request lifecycle. Input mediation screens the user question before retrieval. Retrieval and context control drops documents the caller is not entitled to before they reach the prompt. Output mediation inspects the model answer before anything renders. Action authorization gates the outbound fetch that the markdown renderer would otherwise perform. You implement each hook in code and wire it around the assistant's real interface.

You start by reproducing a four-attack battery against the unguarded assistant so you can see every failure with your own eyes: a direct prompt injection, a cross-tenant retrieval leak, a sensitive-field exfiltration through the EchoLeak markdown-image channel (CVE-2025-32711), and an unauthorized outbound image fetch. You then watch a single keyword filter get bypassed by an obvious variant, which is why shallow fixes fail. Finally you build the durable controls, one per layer, and verify behaviorally that a freshly planted battery is blocked at the matching control point while a benign account question passes clean through all four. The payoff is a defense arranged the way OWASP and NIST recommend: layered, with each control owning one trust boundary.

Frequently asked questions

Do I need to know machine learning to do this lab?

No. You need to read and write Python and understand a basic HTTP request. The lab is about where to place security controls in an LLM application's request lifecycle, not about model internals. Everything model-specific is explained inline.

What are the four control points?

They map to the four stages of a RAG request. Input mediation screens the incoming user question. Retrieval and context control filters retrieved documents by the caller's entitlement before they enter the prompt. Output mediation inspects the model's answer for smuggled sensitive data before it is rendered. Action authorization gates any side effect the answer triggers, here the outbound image fetch. Placing one control at each boundary is defense in depth: a bypass at one layer is caught at the next.

Why is a single keyword filter not enough?

A keyword or deny-list filter blocks the one phrasing you saw and nothing else. An attacker rewords the injection, base64-encodes the payload, or moves the attack to a different stage of the pipeline. You will bypass a naive filter in this lab and then build controls that gate on structure and entitlement (host allow-list, tenant scope, sensitive-pattern detection) rather than on a list of bad strings.

How are the control-point steps graded if the model is non-deterministic?

You build one control point per step, and each control-point check is deterministic: it plants a fresh attack surface, then exercises your hook directly (input_guard on a question, context_guard on retrieved chunks, output_guard on an answer, action_guard on a URL) so the verdict does not depend on the model's wording. The only model-dependent observation is the EchoLeak exfiltration leak, which fires reliably and is graded as "at least one account question leaks," so a non-deterministic model cannot make the reproduce step flaky.