Sensitive Data Disclosure: Leak Confidential Records from a RAG Assistant
Hosted · ide
Beta

Sensitive Data Disclosure: Leak Confidential Records from a RAG Assistant

Attack a real Retrieval-Augmented Generation assistant where the system prompt only asks for privacy: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. Force disclosure of your own gated billing secret, pull another customer's record across a disabled tenant filter, harvest an accidentally-indexed service key, then ship the real fix with pre-retrieval authorization, corpus hygiene, and output redaction.

75 min8 steps3 domainsIntermediate

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Recon: map the assistant and its retrieved records
    You are an authenticated Globex customer of DV-RAG-Support, ACME Cloud's
  2. 2
    Disclose your own: force your gated billing secret
    This is the first of three disclosure classes, the simplest one: your own gated
  3. 3
    Cross-tenant: pull another customer's record into your answer
    The second disclosure class crosses a trust boundary. You are still the Globex
  4. 4
    Credential: harvest an accidentally-indexed service key
    The third disclosure class is a secret that should never have been in the corpus
  5. 5
    Measure: disclosure ASR across the three classes
    You landed three disclosures by hand: your own gated secret, another tenant's PII,
  6. 6
    Harden 1: per-caller retrieval scope
    You proved three leaks. None were stopped by the system prompt, because the prompt
  7. 7
    Harden 2: corpus hygiene + output redaction
    Step 6 scoped retrieval, which closed the cross-tenant leak. Two classes remain:
  8. 8
    Verify: re-run all three disclosures, blocked
    You shipped two hardening steps: per-caller retrieval scope (Step 6) and corpus

Prerequisites

  • Comfortable reading Python
  • Basic HTTP and markdown
  • No ML background required

Exam domains covered

Offensive AI SecurityLLM Application SecuritySensitive Information Disclosure

Skills & technologies you'll practice

This intermediate-level ai/ml lab gives you real-world reps across:

Sensitive Information DisclosureData LeakageCross-TenantOWASP LLM02RAGOffensive SecurityAI Red Team

What you'll do in this lab

This is a hands-on offensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. You attack DV-RAG-Support as an authenticated customer whose own confidential record is retrieved to answer account questions. The assistant's system prompt carries a soft privacy line asking the model not to reveal confidential identifiers, and you will show why that line is not a control. The sensitive data is already in the retrieved context, and a model echoes retrieved content far more readily than it refuses a flagged field, so a broad or structured request surfaces it.

You drive three disclosures hands-on: forcing your own gated billing secret out of your retrieved record, pulling another customer's record into your answer across a tenant filter a migration left disabled, and harvesting a live service key that was accidentally indexed in a draft runbook. Then you flip to defense and ship the real fix: pre-retrieval authorization scoped to the requesting user, corpus hygiene so secrets are never ingested, and output-side redaction as defense in depth, all without breaking legitimate answers. Maps to OWASP LLM02:2025 Sensitive Information Disclosure and MITRE ATLAS AML.T0057.

Frequently asked questions

What is sensitive information disclosure in RAG?

It is when a Retrieval-Augmented Generation system surfaces confidential data that should not have reached the user: another tenant's record, a secret accidentally indexed in the corpus, or a gated field the application tried to protect with a prompt instruction. The data is in the retrieved context, so the model can read it back when asked.

Why doesn't a privacy line in the system prompt stop the leak?

A system prompt is conditioning text, not an access-control boundary. The confidential data is already in the context window via retrieval, and a model that is dumping a structured record or listing everything it can see echoes confidential fields it would refuse to name directly. The fix is to keep the data out of the context with authorization at retrieval, not to ask the model to keep a secret.

How is cross-tenant leakage prevented?

Authorization scoped to the requesting user, applied before the retrieval search (a metadata filter derived from the authenticated session, not from any caller-supplied value), so another tenant's chunks never enter the prompt. Output-side PII redaction and corpus hygiene are defense in depth on top of that, not replacements for it.

Do I need an ML background?

No. You need to read Python and run a few chat queries. Everything model-specific is explained inline. The lab is about how a RAG application handles retrieval and confidential data, not about model internals.