Recon and Harness: Map a RAG Attack Surface and Measure Attack-Success-Rate
Open your AI Red Team engagement against a real Retrieval-Augmented Generation assistant and build the methodology the whole path reuses. Stand up the service and trace one request, enumerate its attack surface into a structured, machine-checkable map, encode a single probe, then build a deterministic side-channel oracle that counts real effects instead of the model's talk. Triage true positives from verbal-only false positives, scale to an Attack-Success-Rate harness with a per-class breakdown, wire it into a CI gate that goes red on the vulnerable build, then ship the render-path allow-list and watch the same gate go green while benign questions still answer.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Stand up DV-RAG and trace one requestYou are starting an engagement against DV-RAG-Support, ACME Cloud's
- 2Recon the retriever: probe top-k and write the first map blockRecon is where every later finding comes from. Your deliverable across the next
- 3Recon the render sink: confirm the EchoLeak auto-fetch channelThe retriever decides what reaches the model. The render sink decides what
- 4Finish the surface map: endpoints, tool hints, trust boundaryTwo blocks are mapped from real probes. Now finish the picture. The last three
- 5Encode one attack probe and fire itRecon told you the surface. Now encode a single attack and see whether it fires.
- 6Build the deterministic success oracleYou fired a probe. But what counts as a success? This is the single hardest
- 7Triage true positives from false positivesYour oracle is only useful if it counts the right things. A measurement built
- 8Build the ASR harness: a battery with a per-class breakdownOne probe, one run, proves nothing. The target is stochastic: the same payload
- 9Turn the harness into a CI gate (red on the vulnerable build)A number in a report is easy to ignore. A failing build is not. The payoff of
- 10Apply the fix and verify: gate green, benign still worksThe harness proved the vulnerability is real and reliable, and the gate is red.
Prerequisites
- Comfortable reading Python
- Know what an HTTP GET and a markdown image are
- No ML background required
Exam domains covered
Skills & technologies you'll practice
This beginner-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on offensive-security lab built on a real Retrieval-Augmented Generation (RAG) stack: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. You run the recon and measurement phases of an AI red-team engagement against a working support assistant called DV-RAG-Support, in small steps that build on each other. You stand the service up and trace one request, then enumerate its attack surface and emit a structured, machine-checkable surface map: the endpoints, the retrieval behavior, the markdown render sink, and the trust boundary the rest of the path attacks. Recon is where every later finding comes from, and you finish it with a deliverable a real engagement would hand off.
Then you build the instrument the whole path reuses. You encode a single probe and fire it, then build a deterministic side-channel oracle that counts a real effect, the confidential account reference leaving the pod through a URL the renderer actually loaded, rather than the model's talk. You triage that oracle against a benign case and a verbal-only false positive so you understand why a naive "did the model output bad text" detector over-counts. You scale to an Attack-Success-Rate (ASR) harness that fires a battery across N trials with a per-class breakdown and a structured JSONL audit log, because an LLM target is stochastic and one trial proves nothing. The behavior you measure is the EchoLeak markdown-image channel (CVE-2025-32711), the real-world zero-click pattern where a model encodes data into an image URL the client auto-loads. You wire the harness into a CI gate that goes red on the vulnerable build, then ship the render-path allow-list and watch the same gate go green while benign questions still answer, the methodology payoff that proves a fix actually works.