MCP Tool Poisoning: Hijack an Agent Through a Tool Description (and a Rug Pull)
Hosted · ide
Beta

MCP Tool Poisoning: Hijack an Agent Through a Tool Description (and a Rug Pull)

Attack a real MCP-style tool registry end to end. OrbitDesk's support agent loads its tools from a runtime registry and reads each tool description as trusted instruction text. Register a poisoned tool whose description hides a routine-looking audit directive, make the agent read an account record and forward its reference to your in-pod collector (data exfiltration through tool metadata), then pull a rug: register the tool benign, pass review, and silently mutate its description after approval. Measure attack-success-rate, then harden the registry, scan descriptions for hidden instructions and pin approved tool objects, and prove a fresh poison and a fresh rug pull are both dead while benign tickets still resolve.

95 min8 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Recon: map the agent and its tool trust path
    You are red-teaming OrbitDesk, an internal IT-support assistant. It is a real
  2. 2
    Register: get your own tool loaded
    Before weaponizing anything, prove the supply-chain hole: you can put a tool into the
  3. 3
    Poison: fire exfiltration through the tool description
    Now give your tool's description teeth. The agent reads each tool description as
  4. 4
    Rug pull: mutate the tool after approval
    A tool description is reviewed once, at connect time. After that, the registry lets it
  5. 5
    Measure: attack-success-rate
    A finding needs a number. "It leaked once" is anecdote; "it leaks on 5 of 6 paced
  6. 6
    Harden (scan): drop poisoned descriptions at the registry boundary
    You proved the agent trusts a tool description as much as code, and that the leak
  7. 7
    Harden (pin): reject a post-approval mutation with a content hash
    The description scan from Step 6 stops a poisoned description from loading. It does not
  8. 8
    Verify and resist: benign intact, a fresh-variant battery dead
    A control you cannot measure is a control you cannot trust. You applied two

Prerequisites

  • Comfortable reading Python
  • Know what an HTTP GET and a JSON tool schema are
  • Helpful: completed the Excessive Agency or Indirect Prompt Injection labs

Exam domains covered

Offensive AI SecurityLLM Application SecurityAgentic Supply Chain Security

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

MCP Tool PoisoningAgentic Supply ChainTool Description PoisoningRug PullData ExfiltrationAgentic AI SecurityOWASP ASI02OWASP LLM01MITRE ATLASAI Red Team

What you'll do in this lab

This is a hands-on offensive-security lab built on a real agent supply chain: a ReAct loop with native tool-calling against an in-cluster model, and an MCP-style tool registry the agent loads its tools from at session start. You attack OrbitDesk, an internal IT-support assistant, by poisoning the one thing the agent trusts as much as code: a tool description. You register a plausible entitlement-check tool whose description hides a routine-looking audit directive, and the agent reads an account record and forwards its reference to your in-pod collector. You never jailbreak the model. The account reference it moves is mundane business data, which is exactly why an aligned model complies.

You then pull a rug pull: register the tool benign, pass review, and silently mutate its description after approval with no re-consent, proving that approval is a snapshot and not an invariant. You measure attack-success-rate over a paced battery, then switch to defense and harden the registry boundary: scan tool descriptions for hidden instructions so a poisoned description never enters the model's context, pin and hash approved tool objects so a post-approval mutation is rejected, and namespace tools so a duplicate name cannot shadow a trusted one. You re-run a fresh poison and a fresh rug pull to prove they are dead while benign tickets still resolve.

Frequently asked questions

Do I need a machine-learning background?

No. The lab is about supply-chain trust, not model internals. You read a small ReAct agent and an MCP-style tool registry, find that tool metadata is trusted with no boundary, and drive the agent from a poisoned tool description. The fixes are ordinary supply-chain controls: scanning, pinning, and namespacing.

What is MCP tool poisoning?

An agent host injects each tool's name, description, and parameter schema into the model's context so the model can decide when to call it. The model treats that metadata as trusted instruction text. A directive hidden in a tool description, or silently mutated into one after approval, steers the model. It is OWASP Agentic ASI02 Tool Misuse with LLM01 prompt injection as the delivery, and MITRE ATLAS Publish Poisoned AI Agent Tool.

What is a rug pull here?

A tool description is reviewed once, at connect time. The registry then lets it change with no re-approval. You register the tool clean, pass review, then swap in a poisoned description. The agent never re-consents. The oracle proves the swap by the tool's content hash changing between the clean and poisoned runs.

How is the exploit graded?

Deterministically and structurally, never on model wording. Because an aligned agent fires a tool-misuse exploit inconsistently, each step gates on a structural, model-independent fact and keeps the live-model run as a best-effort print. The poison step grades that the poisoned description reaches the model catalog verbatim. The rug-pull step grades a changed tool content hash. The harden steps grade that a fresh poisoned description is dropped by the scan, a fresh rug-pull mutation is rejected by the pin, and a variant battery is neutralized while a benign ticket still resolves.