Recon and Harness: Map a RAG Attack Surface and Measure Attack-Success-Rate

Open your AI Red Team engagement against a real Retrieval-Augmented Generation assistant and build the methodology the whole path reuses. Stand up the service and trace one request, enumerate its attack surface into a structured, machine-checkable map, encode a single probe, then build a deterministic side-channel oracle that counts real effects instead of the model's talk. Triage true positives from verbal-only false positives, scale to an Attack-Success-Rate harness with a per-class breakdown, wire it into a CI gate that goes red on the vulnerable build, then ship the render-path allow-list and watch the same gate go green while benign questions still answer.

100 min10 steps3 domainsBeginner

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface

Query

Retriever

LLM

Poisoned doc

retrieved chunk

Answer

0%

Attack-success rate

Attacks blocked · benign answers pass

graded on real output, not the model's talk

What you'll learn

1
Stand up DV-RAG and trace one request
You are starting an engagement against DV-RAG-Support, ACME Cloud's
2
Recon the retriever: probe top-k and write the first map block
Recon is where every later finding comes from. Your deliverable across the next
3
Recon the render sink: confirm the EchoLeak auto-fetch channel
The retriever decides what reaches the model. The render sink decides what
4
Finish the surface map: endpoints, tool hints, trust boundary
Two blocks are mapped from real probes. Now finish the picture. The last three
5
Encode one attack probe and fire it
Recon told you the surface. Now encode a single attack and see whether it fires.
6
Build the deterministic success oracle
You fired a probe. But what counts as a success? This is the single hardest
7
Triage true positives from false positives
Your oracle is only useful if it counts the right things. A measurement built
8
Build the ASR harness: a battery with a per-class breakdown
One probe, one run, proves nothing. The target is stochastic: the same payload
9
Turn the harness into a CI gate (red on the vulnerable build)
A number in a report is easy to ignore. A failing build is not. The payoff of
10
Apply the fix and verify: gate green, benign still works
The harness proved the vulnerability is real and reliable, and the gate is red.

Prerequisites

Comfortable reading Python
Know what an HTTP GET and a markdown image are
No ML background required

Exam domains covered

Offensive AI SecurityLLM Application SecurityReconnaissance and Measurement

Skills & technologies you'll practice

This beginner-level ai/ml lab gives you real-world reps across:

ReconnaissanceAttack SurfaceRAGASRTest HarnessSuccess OracleCI GateEchoLeakOWASP LLMAI Red Team

What you'll do in this lab

This is a hands-on offensive-security lab built on a real Retrieval-Augmented Generation (RAG) stack: a Milvus vector store, NVIDIA embeddings, and a multi-tenant knowledge base. You run the recon and measurement phases of an AI red-team engagement against a working support assistant called DV-RAG-Support, in small steps that build on each other. You stand the service up and trace one request, then enumerate its attack surface and emit a structured, machine-checkable surface map: the endpoints, the retrieval behavior, the markdown render sink, and the trust boundary the rest of the path attacks. Recon is where every later finding comes from, and you finish it with a deliverable a real engagement would hand off.

Then you build the instrument the whole path reuses. You encode a single probe and fire it, then build a deterministic side-channel oracle that counts a real effect, the confidential account reference leaving the pod through a URL the renderer actually loaded, rather than the model's talk. You triage that oracle against a benign case and a verbal-only false positive so you understand why a naive "did the model output bad text" detector over-counts. You scale to an Attack-Success-Rate (ASR) harness that fires a battery across N trials with a per-class breakdown and a structured JSONL audit log, because an LLM target is stochastic and one trial proves nothing. The behavior you measure is the EchoLeak markdown-image channel (CVE-2025-32711), the real-world zero-click pattern where a model encodes data into an image URL the client auto-loads. You wire the harness into a CI gate that goes red on the vulnerable build, then ship the render-path allow-list and watch the same gate go green while benign questions still answer, the methodology payoff that proves a fix actually works.

Frequently asked questions

Do I need to know machine learning to do this lab?

No. You need to read Python and understand a basic HTTP request. The lab is about how to map an LLM application's attack surface and how to measure an attack reliably, not about model internals. Everything model-specific is explained inline.

What is an Attack-Success-Rate (ASR) harness?

It is a test harness that runs the same attack many times against a target and reports how often it succeeds, as a fraction between 0 and 1. Because an LLM is non-deterministic, a single run tells you almost nothing. The harness fires a battery of probes across N trials, judges each with a deterministic oracle, logs every trial, and computes ASR = successful trials / total trials. You build a reusable one here and extend it through the rest of the path.

What is EchoLeak and why does the render sink matter?

EchoLeak (CVE-2025-32711) was a real zero-click exploit against Microsoft 365 Copilot. Hidden instructions made the assistant encode sensitive data into a markdown image URL, and the client auto-loaded that image, exfiltrating the data with no user interaction. The render sink in this lab is the same channel. Mapping it during recon and measuring how reliably it fires is exactly the skill this lab builds.