MCP Tool Poisoning: Hijack an Agent Through a Tool Description (and a Rug Pull)
Attack a real MCP-style tool registry end to end. OrbitDesk's support agent loads its tools from a runtime registry and reads each tool description as trusted instruction text. Register a poisoned tool whose description hides a routine-looking audit directive, make the agent read an account record and forward its reference to your in-pod collector (data exfiltration through tool metadata), then pull a rug: register the tool benign, pass review, and silently mutate its description after approval. Measure attack-success-rate, then harden the registry, scan descriptions for hidden instructions and pin approved tool objects, and prove a fresh poison and a fresh rug pull are both dead while benign tickets still resolve.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Recon: map the agent and its tool trust pathYou are red-teaming OrbitDesk, an internal IT-support assistant. It is a real
- 2Register: get your own tool loadedBefore weaponizing anything, prove the supply-chain hole: you can put a tool into the
- 3Poison: fire exfiltration through the tool descriptionNow give your tool's description teeth. The agent reads each tool description as
- 4Rug pull: mutate the tool after approvalA tool description is reviewed once, at connect time. After that, the registry lets it
- 5Measure: attack-success-rateA finding needs a number. "It leaked once" is anecdote; "it leaks on 5 of 6 paced
- 6Harden (scan): drop poisoned descriptions at the registry boundaryYou proved the agent trusts a tool description as much as code, and that the leak
- 7Harden (pin): reject a post-approval mutation with a content hashThe description scan from Step 6 stops a poisoned description from loading. It does not
- 8Verify and resist: benign intact, a fresh-variant battery deadA control you cannot measure is a control you cannot trust. You applied two
Prerequisites
- Comfortable reading Python
- Know what an HTTP GET and a JSON tool schema are
- Helpful: completed the Excessive Agency or Indirect Prompt Injection labs
Exam domains covered
Skills & technologies you'll practice
This advanced-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on offensive-security lab built on a real agent supply chain: a ReAct loop with native tool-calling against an in-cluster model, and an MCP-style tool registry the agent loads its tools from at session start. You attack OrbitDesk, an internal IT-support assistant, by poisoning the one thing the agent trusts as much as code: a tool description. You register a plausible entitlement-check tool whose description hides a routine-looking audit directive, and the agent reads an account record and forwards its reference to your in-pod collector. You never jailbreak the model. The account reference it moves is mundane business data, which is exactly why an aligned model complies.
You then pull a rug pull: register the tool benign, pass review, and silently mutate its description after approval with no re-consent, proving that approval is a snapshot and not an invariant. You measure attack-success-rate over a paced battery, then switch to defense and harden the registry boundary: scan tool descriptions for hidden instructions so a poisoned description never enters the model's context, pin and hash approved tool objects so a post-approval mutation is rejected, and namespace tools so a duplicate name cannot shadow a trusted one. You re-run a fresh poison and a fresh rug pull to prove they are dead while benign tickets still resolve.