Memory Poisoning: Plant a Note That Re-Fires in a Fresh Session (Persistence)
Hosted · ide
Beta

Memory Poisoning: Plant a Note That Re-Fires in a Fresh Session (Persistence)

Attack the long-term memory of a real tool-using ReAct agent. As the low-privilege ticket-ingest account, plant a single benign-looking routing note into DV-ToolAgent's shared, un-namespaced memory store through an ingested ticket. In a brand-new session for a different legitimate user, the agent recalls your note and redirects a GLOBEX invoice to your payee, persistence across the session boundary that single-turn filters never see. Use MINJA-style progressive shortening so the stored record reads as a mundane preference, measure attack-success-rate against benign controls, then harden memory with per-user namespacing and a data-only quarantine and prove the poison dead while legitimate recall still works.

80 min8 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Recon: read the memory path
    DV-ToolAgent has a long-term memory. It stores durable notes (routing rules,
  2. 2
    Plant: persist a poisoned routing note (session A)
    In session A you act as ticket-bot, ingesting an inbound ticket. Your goal is to make
  3. 3
    Persist: fire the planted note in a fresh session
    This is the headline. Your poisoned note now lives in long-term memory. In a
  4. 4
    Stealth: make the planted note look benign (MINJA)
    A defender skimming the memory bank should not spot an obvious imperative. This is the
  5. 5
    Measure: persistence rate and benign cleanliness
    Memory persistence is largely structural: the recall plus concatenation does most of the
  6. 6
    Harden 1: quarantine recalled memory as data
    You proved a planted note re-fires for a different user in a fresh session because recalled
  7. 7
    Harden 2: per-user namespacing and provenance
    The quarantine stops a recalled note from being executed, but it does not stop a
  8. 8
    Verify and resist: prove the poison dead, real memory intact
    One non-fire is not proof, and a fix that kills useful memory is not a fix. With both

Prerequisites

  • Comfortable reading Python
  • Know what a SQL UPDATE is
  • No ML background required

Exam domains covered

Offensive AI SecurityLLM Application SecurityAgent Memory Poisoning

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

Memory PoisoningPersistenceMINJAExcessive AgencyAgentic AI SecurityOWASP LLM06OWASP ASI06AI Red Team

What you'll do in this lab

This is a hands-on offensive-security lab on agent memory poisoning. You attack DV-ToolAgent, a real ReAct tool-using assistant whose long-term memory recalls durable notes into every new session. Acting as ticket-bot, the low-privilege ticket-ingest account, you plant a single routing note into the shared, un-namespaced memory store through an ingested support ticket. The note ties GLOBEX invoices to your attacker payee. Then, in a brand-new session with cleared context and a different legitimate user, a normal GLOBEX invoice question recalls your note and the agent redirects payment to your payee. Persistence across the session boundary is the win, and it is the property single-turn prompt filters cannot catch (OWASP LLM06 Excessive Agency, Agentic ASI06 Memory and Context Poisoning).

You make the attack realistic with MINJA-style progressive shortening: once the agent treats a stored GLOBEX-payee note as a routing rule to apply, you drop the overt instruction language so the residual record reads as a mundane billing preference while it still fires. You measure attack-success-rate over a paced battery against benign controls (a non-GLOBEX invoice), demonstrating AgentPoison's point that a single planted entry yields high targeted ASR with clean benign behavior. Finally you harden the memory layer: per-user namespacing and provenance so a low-privilege caller's note never reaches another user, plus a data-only quarantine so recalled notes are never executed as instructions. You re-run the plant to prove it is dead while legitimate recall still works.

Frequently asked questions

Do I need a machine-learning background?

No. The core of this lab is trust and isolation in an agent's memory store, not model internals. You read how notes are stored and recalled, find that there is no provenance and no per-user namespacing, and plant a note that survives the session boundary. The fixes are ordinary access-control and data-handling boundaries.

What is memory poisoning and why is persistence the point?

Memory poisoning plants attacker-controlled content into an agent's long-term store so it is recalled and acted on in future sessions. Persistence is the point because a payload written in one session re-fires in a fresh session after the context is gone, for a different user, which single-turn filters that watch one conversation never catch. It maps to Agentic ASI06 and MITRE ATLAS Persistence and Defense Evasion.

How is the exploit graded?

Deterministically, on side effects, never on model wording. The plant step grades the stored memory record (it must tie GLOBEX to the attacker payee). The persistence step grades a fresh-session side effect (the GLOBEX invoice payee redirected, with session-A context gone). The harden step re-plants a fresh poison, confirms it no longer fires, and confirms legitimate recall still works.