Retrieval Poisoning: Win Top-k Across a Whole Query Class and Steer the Answer

Attack a real Retrieval-Augmented Generation assistant where it is most exposed: retrieval. Plant one document in a Milvus + NVIDIA embeddings knowledge base, craft it to win cosine top-k for one account question, then for the whole account-query class, then steer the generated answer through a directive framed as routine policy. Measure broad-class coverage and steering attack-success-rate, then harden in two distinct moves: treat retrieved context as data behind a non-spoofable boundary, and cap how many top-k slots any single source may take. Re-run the same battery and watch attack-success-rate collapse.

80 min8 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface

Query

Retriever

LLM

Poisoned doc

retrieved chunk

Answer

0%

Attack-success rate

Attacks blocked · benign answers pass

graded on real output, not the model's talk

What you'll learn

1
Recon: map the retriever and the account-query class
You are red-teaming DV-RAG-Support, ACME Cloud's customer-support assistant.
2
Craft poison: author one document that wins top-k for the plan query
You cannot steer an answer the model never reads. The first half of retrieval
3
Win the class: capture top-k for all four account queries
Winning one query is a demo. Owning a class is the attack. Broaden your
4
Steer: make the retrieved policy line dictate the answer
Winning retrieval put your document in front of the model. Now use the second half
5
Measure: coverage and steering ASR across a battery
One lucky hit is a demo. A finding needs numbers. You measure your poison across a
6
Harden (context): treat retrieved context as data, not instructions
Now switch sides. You proved a planted document can win retrieval and steer the
7
Harden (dominance): cap how many top-k slots one source can take
The context-trust fix from Step 6 stops the directive from steering the answer. But
8
Verify: re-run the battery and watch ASR drop
You applied two fixes: the context-trust boundary (Step 6) that kills the steering

Prerequisites

Comfortable reading Python
Understand cosine similarity at a high level (no ML background required)
Know what a markdown image and an HTTP GET are

Exam domains covered

Offensive AI SecurityLLM Application SecurityRetrieval-Augmented Generation

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

RAGRetrieval PoisoningEmbedding WeaknessesPoisonedRAGOWASP LLM08Offensive SecurityAI Red Team

What you'll do in this lab

This is a hands-on offensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA nv-embedqa-e5-v5 embeddings, chunking, and top-k ANN retrieval feeding a live LLM. You attack DV-RAG-Support by planting a single document and engineering it to win cosine top-k for a whole class of account questions, the retrieval condition behind PoisonedRAG. Because retrieval is decided by the embedder and not the model, model alignment cannot stop your document from reaching the prompt.

Then you exploit the generation condition: a directive framed as routine ACME answer policy, sitting inside your retrieved document, that the assistant applies to its answer. You measure attack-success-rate across a battery of phrasings (retrieval coverage and steering rate), the way a real engagement reports impact. Finally you switch to defense and harden the pipeline in two distinct moves: you make the model treat retrieved context as untrusted data behind a non-spoofable boundary and strip directive-shaped lines (closing the steering), then you cap how many top-k slots any single source may take (so no one document can dominate retrieval). You re-run the same battery to prove the attack-success-rate drops, the methodology payoff that shows a fix actually works.

Frequently asked questions

Do I need a machine-learning background?

No. You need to read Python and have an intuition for cosine similarity (text that shares vocabulary lands near a query). Everything model-specific and embedding-specific is explained inline, and the lab uses a real NVIDIA embedding model so the retrieval behavior you see is authentic.

What is retrieval poisoning?

Retrieval poisoning plants a document engineered to win semantic retrieval for many queries, then carries a directive that steers the answer once it is retrieved. The two halves are the PoisonedRAG retrieval condition (winning top-k) and the generation condition (the model applying the planted directive). This lab makes both concrete against a real Milvus + NVIDIA embeddings pipeline.

Why measure an attack-success-rate instead of a single demo?

An aligned model does not obey every directive, and retrieval does not win for every phrasing. A finding reports how reliably the exploit fires across the questions real users ask, so you measure coverage and steering as rates across a battery, exactly as you would in a professional engagement.