Build & submit taskBetaadvanced

Build a RAG Ingestion and Tenant-Isolation Firewall

Build the defensive control for a Retrieval-Augmented Generation pipeline: an ingestion validator that rejects poisoned documents before they are indexed, plus an ENFORCED tenant-scoped retrieval predicate that treats the tenant filter as a security boundary the caller cannot bypass. Start from a vulnerable RAG and two provided proofs-of-concept (a retrieval-poisoning PoC and a cross-tenant PoC); the exploits are only the pass/fail oracle. Wire in your ingestion gate and enforced predicate, re-run both PoCs to show they are now BLOCKED, run a benign same-tenant query to show retrieval still returns the correct answer, and write a short remediation rationale. Submit a single script or notebook for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

The filter is a security boundary

A tenant scope supplied by the caller is advisory and bypassable. Derive it from the authenticated session and apply it as a parameterized predicate inside the retriever so omitting or rewriting it cannot widen the result set.

Vet at ingestion, before retrieval can reward the attacker

A poison document that never enters the index can never win top-k. An ingestion gate (provenance allow-list, instruction-marker and over-repetition scanning, field and size validation) is cheaper and more durable than trying to detect the poison at query time.

Defense-in-depth at the data layer

Model-side classifiers were each bypassed independently in real incidents. The durable controls sit at the data layer: vet what enters the corpus and enforce who may read each row, so a single bypassed model defense is not the whole boundary.

Prove no over-blocking

A control that also rejects clean documents or starves legitimate same-tenant queries has shipped a regression. The benign control is part of the acceptance test, so treat it as a first-class requirement.

The exploit is the oracle

On the defensive side the proof-of-concept serves as the acceptance test for the control you build. You pass when attack-success drops to zero AND the benign control still succeeds.

The scenario

You are the platform engineer who owns a multi-tenant Retrieval-Augmented Generation (RAG) assistant. It answers staff and customer questions from a shared document store, and each document carries a tenant_id. Two problems landed on your desk this sprint. First, anyone who can file a support ticket or edit a wiki page can get a document into the index, and a red-team report just showed a planted document winning retrieval for a normal query and steering the answer (MITRE ATLAS AML.T0070, RAG Poisoning). Second, the tenant filter in the retriever is applied as a caller-supplied hint, so a request that omits or rewrites it reads another tenant's confidential records (OWASP LLM02 Sensitive Information Disclosure).

Your lead does not want another attack write-up. She wants the control shipped: an ingestion gate that rejects poisoned documents before they are indexed, and a tenant predicate that the retriever enforces server-side so it cannot be bypassed. The provided exploits are the acceptance test. You pass when both proofs-of-concept are blocked AND a legitimate same-tenant query still returns the right document. That control, built and proven, is this task.

Your role

You are a defensive security engineer hardening a multi-tenant RAG pipeline. Your goal is a single, self-contained file that BUILDS the control: an ingestion validator that rejects poisoned documents before indexing and a tenant-scoped retrieval predicate enforced as a security boundary. You start from a vulnerable target and two working proofs-of-concept, then prove the hardened pipeline blocks both exploits while a benign same-tenant query still returns the correct result.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP LLM08:2025 Vector and Embedding Weaknesses

The taxonomy entry for retrieval and embedding attacks including RAG poisoning and cross-context leakage; the primary control target for this task.

genai.owasp.org

OWASP LLM02:2025 Sensitive Information Disclosure

The taxonomy entry for cross-tenant and cross-context leakage that the enforced predicate closes.

genai.owasp.org

OWASP LLM01:2025 Prompt Injection

The directive-borne foothold inside a poisoned document that the ingestion gate scans for.

genai.owasp.org

OWASP Agentic AI Top 10 (ASI)

The Agentic Security Initiative threat list; map the ingestion and isolation controls to the data-poisoning and context-manipulation entries (verify current ASI ids).

genai.owasp.org

MITRE ATLAS: AML.T0070 RAG Poisoning

The technique your ingestion gate is designed to defeat (verify the current technique id before citing).

atlas.mitre.org

NIST AI 100-2 Adversarial Machine Learning

Taxonomy and mitigations for data poisoning and abuse of ML systems, including ingestion-time defenses.

csrc.nist.gov

What this task is

This is a build-and-submit defensive-security task. You build a working control rather than answer a quiz: a RAG firewall with an ingestion validator that rejects poisoned documents before they are indexed and a tenant-scoped retrieval predicate the retriever enforces server-side as a security boundary. You start from a vulnerable RAG and two provided proofs-of-concept (a retrieval-poisoning PoC and a cross-tenant PoC), but the exploits are only the acceptance test. You pass when both are blocked and a legitimate same-tenant query still returns the correct document.

RAG poisoning and cross-tenant leakage (OWASP LLM08 Vector and Embedding Weaknesses, LLM02 Sensitive Information Disclosure, LLM01 Prompt Injection, and MITRE ATLAS AML.T0070) are the mechanisms behind real RAG incidents. The skill this task builds is the defensive one: vet what enters the corpus so a poison never wins retrieval, and enforce who may read each row so a request cannot widen its own tenant scope. These are data-layer boundaries, durable in a way model-side classifiers proved not to be, because they hold even when a single model defense is bypassed.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (the ingestion validator built and correct, the tenant predicate enforced as a boundary, both proofs-of-concept blocked after hardening, benign same-tenant retrieval preserved, the remediation rationale and standards mapping, and the control being minimal and runnable) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Do I need a paid API key?

No. You can build a tiny self-contained vulnerable RAG (a SQLite or numpy index tagged with tenant_id, a keyword or cosine retriever, and a naive caller-supplied filter) and harden it, or target a real model and embedder and say so. The rubric rewards the control and the blocked re-run, and it is agnostic to which backend you used.

What exactly is the deliverable?

The control: an ingestion validator that rejects poisoned documents before indexing and a tenant-scoped retrieval predicate enforced from the authenticated session. The two provided exploits are the pass/fail oracle. You pass when both are blocked and a benign same-tenant query still returns the correct document.

Why must the tenant filter be enforced server-side?

A scope supplied by the caller is advisory and bypassable. A request that omits or rewrites it reads another tenant's data. Deriving the scope from the authenticated session and applying it as a parameterized predicate inside the retriever makes it a boundary the caller cannot widen, which is what closes the cross-tenant leak.

Why vet documents at ingestion instead of at query time?

A poison that never enters the index can never win top-k or steer an answer. An ingestion gate (provenance allow-list, instruction-marker and over-repetition scanning, field and size validation) stops the attack before retrieval can reward it, and it is cheaper and more durable than trying to detect the poison on every query. The benign control proves it does not over-block clean documents.