Retrieval Poisoning: Win Top-k Across a Whole Query Class and Steer the Answer
Hosted · ide
Beta

Retrieval Poisoning: Win Top-k Across a Whole Query Class and Steer the Answer

Attack a real Retrieval-Augmented Generation assistant where it is most exposed: retrieval. Plant one document in a Milvus + NVIDIA embeddings knowledge base, craft it to win cosine top-k for one account question, then for the whole account-query class, then steer the generated answer through a directive framed as routine policy. Measure broad-class coverage and steering attack-success-rate, then harden in two distinct moves: treat retrieved context as data behind a non-spoofable boundary, and cap how many top-k slots any single source may take. Re-run the same battery and watch attack-success-rate collapse.

80 min8 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Recon: map the retriever and the account-query class
    You are red-teaming DV-RAG-Support, ACME Cloud's customer-support assistant.
  2. 2
    Craft poison: author one document that wins top-k for the plan query
    You cannot steer an answer the model never reads. The first half of retrieval
  3. 3
    Win the class: capture top-k for all four account queries
    Winning one query is a demo. Owning a class is the attack. Broaden your
  4. 4
    Steer: make the retrieved policy line dictate the answer
    Winning retrieval put your document in front of the model. Now use the second half
  5. 5
    Measure: coverage and steering ASR across a battery
    One lucky hit is a demo. A finding needs numbers. You measure your poison across a
  6. 6
    Harden (context): treat retrieved context as data, not instructions
    Now switch sides. You proved a planted document can win retrieval and steer the
  7. 7
    Harden (dominance): cap how many top-k slots one source can take
    The context-trust fix from Step 6 stops the directive from steering the answer. But
  8. 8
    Verify: re-run the battery and watch ASR drop
    You applied two fixes: the context-trust boundary (Step 6) that kills the steering

Prerequisites

  • Comfortable reading Python
  • Understand cosine similarity at a high level (no ML background required)
  • Know what a markdown image and an HTTP GET are

Exam domains covered

Offensive AI SecurityLLM Application SecurityRetrieval-Augmented Generation

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

RAGRetrieval PoisoningEmbedding WeaknessesPoisonedRAGOWASP LLM08Offensive SecurityAI Red Team

What you'll do in this lab

This is a hands-on offensive-security lab built on a real RAG stack: a Milvus vector store, NVIDIA nv-embedqa-e5-v5 embeddings, chunking, and top-k ANN retrieval feeding a live LLM. You attack DV-RAG-Support by planting a single document and engineering it to win cosine top-k for a whole class of account questions, the retrieval condition behind PoisonedRAG. Because retrieval is decided by the embedder and not the model, model alignment cannot stop your document from reaching the prompt.

Then you exploit the generation condition: a directive framed as routine ACME answer policy, sitting inside your retrieved document, that the assistant applies to its answer. You measure attack-success-rate across a battery of phrasings (retrieval coverage and steering rate), the way a real engagement reports impact. Finally you switch to defense and harden the pipeline in two distinct moves: you make the model treat retrieved context as untrusted data behind a non-spoofable boundary and strip directive-shaped lines (closing the steering), then you cap how many top-k slots any single source may take (so no one document can dominate retrieval). You re-run the same battery to prove the attack-success-rate drops, the methodology payoff that shows a fix actually works.

Frequently asked questions

Do I need a machine-learning background?

No. You need to read Python and have an intuition for cosine similarity (text that shares vocabulary lands near a query). Everything model-specific and embedding-specific is explained inline, and the lab uses a real NVIDIA embedding model so the retrieval behavior you see is authentic.

What is retrieval poisoning?

Retrieval poisoning plants a document engineered to win semantic retrieval for many queries, then carries a directive that steers the answer once it is retrieved. The two halves are the PoisonedRAG retrieval condition (winning top-k) and the generation condition (the model applying the planted directive). This lab makes both concrete against a real Milvus + NVIDIA embeddings pipeline.

Why measure an attack-success-rate instead of a single demo?

An aligned model does not obey every directive, and retrieval does not win for every phrasing. A finding reports how reliably the exploit fires across the questions real users ask, so you measure coverage and steering as rates across a battery, exactly as you would in a professional engagement.