Prove a Knowledge-Base Exfiltration Chain End to End
Build a self-contained proof-of-concept that exploits a Retrieval-Augmented Generation app through knowledge-base poisoning: plant a document that wins retrieval for a normal user query, abuse a caller-controlled tenant scope to pull another tenant's confidential record into context, and exfiltrate it through the markdown-image sink to a listener. Then write the finding the way a pentester would, with a blast-radius analysis and a remediation (server-side scoping, ingestion validation, an output allow-list) you test and prove blocks your own exploit. Submit a small project (a .zip) for rubric-based feedback.
3 hrs
Est. time
5
Outcomes
7
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
The scenario
You're on a red-team engagement against an internal Retrieval-Augmented Generation (RAG) assistant: it answers staff and customer questions from a multi-tenant document store. The rules of engagement are the usual ones. You cannot social-engineer employees and you cannot touch the model weights. You can get one document into the knowledge base, the same foothold an attacker gets by filing a support ticket, editing a wiki page, or sending an email the assistant later indexes. The reconnaissance step is to map what the corpus contains and what you are allowed to add to it (MITRE ATLAS AML.T0064, Gather RAG-Indexed Targets).
Your lead wants more than 'the chatbot said something weird.' She wants a reproducible proof-of-concept, a clear statement of impact, a blast-radius analysis, and a remediation the platform team can ship. That deliverable, an exploit chain plus a proven fix, is this task.
Your role
You are an offensive security engineer auditing a RAG application. Your goal is a single, self-contained file that proves a knowledge-base exfiltration chain end to end, poison the index, surface restricted data through an indirect directive, exfiltrate it through an output sink, then state the impact, blast radius, and severity like a professional finding and demonstrate a remediation that defeats your own proof-of-concept.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What this task is
This is a build-and-submit offensive-security task, not a quiz about RAG. You produce a small project that proves a knowledge-base exfiltration chain end to end: a poison document delivered through retrieved content, a demonstration that it wins retrieval for a normal query, a caller-controlled tenant scope abused to reach another tenant's confidential record, an exfiltration through an auto-loaded markdown-image sink to a listener, a benign baseline for contrast, a blast-radius analysis, and a tested remediation (server-side scoping, ingestion validation, an output allow-list) you re-run the exploit against to prove it holds.
Knowledge-base poisoning and cross-tenant leakage (OWASP LLM08 and LLM02, MITRE ATLAS AML.T0070) are the mechanisms behind real RAG incidents and the EchoLeak class of zero-click exfiltration. The skill this task builds is the one that separates a real AI red teamer from someone who can make a chatbot misbehave: plant a payload the victim never sees, win retrieval, turn it into concrete restricted-data exfiltration, quantify the blast radius, and report it so the fix actually ships.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (runnable PoC, poison retrieved, exfil demonstrated, restricted data reached, benign baseline, blast-radius analysis, and proven remediation) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit. Questions about the task go to andrew@predica.ai.