Preporato
Build & submit taskBetaadvanced

Prove a Knowledge-Base Exfiltration Chain End to End

Build a self-contained proof-of-concept that exploits a Retrieval-Augmented Generation app through knowledge-base poisoning: plant a document that wins retrieval for a normal user query, abuse a caller-controlled tenant scope to pull another tenant's confidential record into context, and exfiltrate it through the markdown-image sink to a listener. Then write the finding the way a pentester would, with a blast-radius analysis and a remediation (server-side scoping, ingestion validation, an output allow-list) you test and prove blocks your own exploit. Submit a small project (a .zip) for rubric-based feedback.

3 hrs

Est. time

5

Outcomes

7

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

Retrieval is the gate
A poison that is not retrieved is inert. Winning top-k for a query class is the first half of the exploit and it is model-agnostic.
Indirect, policy-framed directives
A directive framed as routine policy inside a trusted retrieved document is applied where a jailbreak would be refused.
Output handling as an exfil channel
An auto-loaded markdown image turns model output into an attacker-controlled outbound request, the EchoLeak pattern.
Blast radius is part of impact
A finding is stronger when it quantifies how many queries the poison captures, which data tiers become reachable, and whether it persists.
Remediation you can prove
A fix only counts when you re-run the exploit against it and show it fails while normal answers still work.

The scenario

You're on a red-team engagement against an internal Retrieval-Augmented Generation (RAG) assistant: it answers staff and customer questions from a multi-tenant document store. The rules of engagement are the usual ones. You cannot social-engineer employees and you cannot touch the model weights. You can get one document into the knowledge base, the same foothold an attacker gets by filing a support ticket, editing a wiki page, or sending an email the assistant later indexes. The reconnaissance step is to map what the corpus contains and what you are allowed to add to it (MITRE ATLAS AML.T0064, Gather RAG-Indexed Targets).

Your lead wants more than 'the chatbot said something weird.' She wants a reproducible proof-of-concept, a clear statement of impact, a blast-radius analysis, and a remediation the platform team can ship. That deliverable, an exploit chain plus a proven fix, is this task.

Your role

You are an offensive security engineer auditing a RAG application. Your goal is a single, self-contained file that proves a knowledge-base exfiltration chain end to end, poison the index, surface restricted data through an indirect directive, exfiltrate it through an output sink, then state the impact, blast radius, and severity like a professional finding and demonstrate a remediation that defeats your own proof-of-concept.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

What this task is

This is a build-and-submit offensive-security task, not a quiz about RAG. You produce a small project that proves a knowledge-base exfiltration chain end to end: a poison document delivered through retrieved content, a demonstration that it wins retrieval for a normal query, a caller-controlled tenant scope abused to reach another tenant's confidential record, an exfiltration through an auto-loaded markdown-image sink to a listener, a benign baseline for contrast, a blast-radius analysis, and a tested remediation (server-side scoping, ingestion validation, an output allow-list) you re-run the exploit against to prove it holds.

Knowledge-base poisoning and cross-tenant leakage (OWASP LLM08 and LLM02, MITRE ATLAS AML.T0070) are the mechanisms behind real RAG incidents and the EchoLeak class of zero-click exfiltration. The skill this task builds is the one that separates a real AI red teamer from someone who can make a chatbot misbehave: plant a payload the victim never sees, win retrieval, turn it into concrete restricted-data exfiltration, quantify the blast radius, and report it so the fix actually ships.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (runnable PoC, poison retrieved, exfil demonstrated, restricted data reached, benign baseline, blast-radius analysis, and proven remediation) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit. Questions about the task go to andrew@predica.ai.

Frequently asked questions

Do I need a paid API key?

No. You can target a real OpenAI-compatible model and embedder, or simulate the retriever and a naive model deterministically and say so. The rubric rewards the chain and the report, not which backend you used.

What counts as 'indirect' delivery here?

The payload must reach the model through a document it retrieves or ingests, not through the prompt the victim typed, and you must show the poison actually wins retrieval for a normal query. That is what makes it a zero-click attack.

What restricted data should I exfiltrate?

Either another tenant's confidential record reached through an isolation gap (a caller-controlled scope or an injectable metadata filter) or the caller's own confidential record. Either way, demonstrate it leaving through an output sink with a benign baseline for contrast.

Why does the task require a blast-radius analysis and a remediation?

A finding is only useful if it quantifies impact and ships a fix. You state how many query classes the poison captures and which data tiers it exposes, then implement a remediation and re-run your own exploit against it to prove it is blocked. Contact andrew@predica.ai with questions.