Defend: Secret Isolation for a RAG Assistant
Harden the same RAG support assistant that the extraction lab broke, in small sequential steps. A live signing key, an internal build id, and a canary token are baked into the system prompt, so the secret is exposed by construction: it shares one context window with the customer's message. Stand the service up, reproduce the exposure (the secret is present in the model's context), and watch a naive cleartext output filter fall to encoding-egress. Then build the durable control one mechanism per step: a vault boundary that holds the secret out of the model's context, a seeded canary tripwire, and a fail-closed decoding leak detector that matches the secret and its Base64/ROT13/hex forms. Verify the secret is unrecoverable and benign answers are intact, then resist a re-planted, re-encoded exfil battery.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Stand up Aria and trace one benign requestYou own Aria, ACME Cloud's Tier-1 support assistant, after a red team
- 2Reproduce: the secret is exposed by constructionThe problem is in dvrag.py. At startup it reads support_secrets.env and bakes
- 3Naive fix bypassed: a cleartext filter falls to encoding-egressThe team's first reaction is the obvious one: scrub the output. They shipped
- 4Mechanism 1: the vault boundary (secret out of the model context)Time to build the durable control. It has three mechanisms, and you build them one
- 5Mechanism 2: a seeded canary tripwireThe secret is out of the context now, so prompt extraction recovers a clean prompt.
- 6Mechanism 3: a fail-closed decoding leak detectorYou have the vault boundary (mechanism 1) and the canary (mechanism 2). The last
- 7Verify: secret unrecoverable, leaks fail-closed, benign answers intactYou built three mechanisms: the vault boundary keeps the secret out of the context,
- 8Resist: re-planted and encoded exfil attempts all blockedA control that only blocks the exact payload you tested is not a control. This step
Prerequisites
- Comfortable reading and editing Python
- Basic HTTP, markdown, and Base64/ROT13 encoding
- Helpful to have seen a prompt-extraction attack first
Exam domains covered
Skills & technologies you'll practice
This advanced-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on defensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA embeddings, and an LLM answer step. You harden Aria, a working support assistant whose system prompt carries real secret material: a live signing key, an internal build identifier, and a canary token. Because the system prompt and the customer's message share one context window, a prompt-extraction request reads the secret straight back. You start by reproducing that leak with the attacker's own exploit, so the control you build is measured against a real bypass and not a toy one.
You ship the obvious fix first, a cleartext output filter, and watch encoding-egress slip past it when the model emits the secret Base64-encoded. Then you build the durable control by hand: secret isolation that keeps the key and canary out of the model's context entirely behind a tool/vault boundary, a unique canary token as an unambiguous tripwire, and a canonicalizing output leak detector that decodes Base64, ROT13, and hex before it blocks. The final step re-plants a fresh exploit each run and confirms it is blocked while a normal answer still works. Maps to OWASP LLM02:2025 Sensitive Information Disclosure and LLM07:2025 System Prompt Leakage, and MITRE ATLAS AML.T0056 / AML.T0057.