Retrieval Poisoning: Win Top-k Across a Whole Query Class and Steer the Answer
Attack a real Retrieval-Augmented Generation assistant where it is most exposed: retrieval. Plant one document in a Milvus + NVIDIA embeddings knowledge base, craft it to win cosine top-k for one account question, then for the whole account-query class, then steer the generated answer through a directive framed as routine policy. Measure broad-class coverage and steering attack-success-rate, then harden in two distinct moves: treat retrieved context as data behind a non-spoofable boundary, and cap how many top-k slots any single source may take. Re-run the same battery and watch attack-success-rate collapse.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Recon: map the retriever and the account-query classYou are red-teaming DV-RAG-Support, ACME Cloud's customer-support assistant.
- 2Craft poison: author one document that wins top-k for the plan queryYou cannot steer an answer the model never reads. The first half of retrieval
- 3Win the class: capture top-k for all four account queriesWinning one query is a demo. Owning a class is the attack. Broaden your
- 4Steer: make the retrieved policy line dictate the answerWinning retrieval put your document in front of the model. Now use the second half
- 5Measure: coverage and steering ASR across a batteryOne lucky hit is a demo. A finding needs numbers. You measure your poison across a
- 6Harden (context): treat retrieved context as data, not instructionsNow switch sides. You proved a planted document can win retrieval and steer the
- 7Harden (dominance): cap how many top-k slots one source can takeThe context-trust fix from Step 6 stops the directive from steering the answer. But
- 8Verify: re-run the battery and watch ASR dropYou applied two fixes: the context-trust boundary (Step 6) that kills the steering
Prerequisites
- Comfortable reading Python
- Understand cosine similarity at a high level (no ML background required)
- Know what a markdown image and an HTTP GET are
Exam domains covered
Skills & technologies you'll practice
This advanced-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on offensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA nv-embedqa-e5-v5 embeddings, chunking, and top-k ANN retrieval feeding a live LLM. You attack DV-RAG-Support by planting a single document and engineering it to win cosine top-k for a whole class of account questions, the retrieval condition behind PoisonedRAG. Because retrieval is decided by the embedder and not the model, model alignment cannot stop your document from reaching the prompt.
Then you exploit the generation condition: a directive framed as routine ACME answer policy, sitting inside your retrieved document, that the assistant applies to its answer. You measure attack-success-rate across a battery of phrasings (retrieval coverage and steering rate), the way a real engagement reports impact. Finally you switch to defense and harden the pipeline in two distinct moves: you make the model treat retrieved context as untrusted data behind a non-spoofable boundary and strip directive-shaped lines (closing the steering), then you cap how many top-k slots any single source may take (so no one document can dominate retrieval). You re-run the same battery to prove the attack-success-rate drops, the methodology payoff that shows a fix actually works.