Defend the Agent Supply Chain: Verify, Pin, and Capability-Gate Your Tool Registry
Hosted · ide
Beta

Defend the Agent Supply Chain: Verify, Pin, and Capability-Gate Your Tool Registry

Harden a real MCP-style tool registry until a poisoned, rug-pulled, or shadowing tool manifest cannot reach the agent, in small single-concept steps. OrbitDesk's support agent loads its tools from a runtime registry and reads each tool description as trusted instruction text. You stand the registry up and trace one benign ticket, then reproduce three techniques one at a time: a poisoned tool description that hijacks the agent into leaking an account reference, a post-approval rug-pull mutation the registry serves with no pin, and a shadowing twin the flat namespace selects under a trusted tool name. You apply the obvious fix, a description blocklist, and watch a clean-description variant defeat it through its delegate. Then you build the durable control one mechanism per step: manifest signature verification, then hash pinning (rug-pull / change detection), then a per-tool capability allow-list with namespacing. You finish by proving that freshly planted unsigned, forged, mutated, and shadowing manifests are all refused while a legitimate signed and pinned tool stays admitted and usable, and an inter-agent worm's second hop is contained.

90 min9 steps3 domainsAdvanced

Hands-on labs require Pro · $29.99/mo · cancel anytime

Map the attack surface
Query
Retriever
LLM
Poisoned doc
retrieved chunk
Answer
0%
Attack-success rate
Attacks blocked · benign answers pass
graded on real output, not the model's talk

What you'll learn

  1. 1
    Stand up OrbitDesk and trace one benign ticket
    You are the defender for OrbitDesk, an internal IT-support assistant. It is a
  2. 2
    Reproduce: a poisoned tool description hijacks the agent
    You have a working exploit in hand. Reproduce it so you know exactly what your
  3. 3
    Reproduce: the registry permits a rug pull (post-approval mutation)
    The poison in the previous step rode in at registration. The second technique is a
  4. 4
    Reproduce: the flat namespace permits a shadowing twin
    The third technique is shadowing (typosquatting a tool name). The registry uses a
  5. 5
    Naive fix, bypassed: a description blocklist is not enough
    The poison rode in on its description, so the obvious reaction is to scan
  6. 6
    Build verifier mechanism 1: manifest signature verification
    A description blocklist scans prose and cannot see provenance. Replace it with an allow
  7. 7
    Build verifier mechanism 2: hash pin (rug-pull detection)
    Mechanism 1 (signature verification) is carried forward in mcp_verifier.py. Now build
  8. 8
    Build verifier mechanism 3: capability allow-list + namespacing
    Mechanisms 1 (signature) and 2 (pin) are carried forward. Now build the third
  9. 9
    Verify and resist: poison, rug pull, shadow, and worm all blocked; legit tools work
    Your verifier is now the gate the registry calls, with all three mechanisms in place:

Prerequisites

  • Comfortable reading and editing Python
  • Know what an HMAC signature and a content hash are
  • Helpful: completed the MCP Tool Poisoning or Tool Shadowing labs

Exam domains covered

Defensive AI SecurityAgentic Supply Chain SecurityLLM Application Security

Skills & technologies you'll practice

This advanced-level ai/ml lab gives you real-world reps across:

Agentic Supply ChainTool Manifest VerificationTool SigningHash PinningRug Pull DetectionCapability Allow-ListInter-Agent Message ValidationDefensive AI SecurityOWASP ASI02MITRE ATLASAI Red Team

What you'll do in this lab

This is a hands-on defensive-security lab built on a real agent supply chain: a ReAct loop with native tool-calling against an in-cluster model, and an MCP-style tool registry the agent loads its tools from at session start. You defend OrbitDesk, an internal IT-support assistant, by closing the one thing the agent trusts as much as code: a tool description. You start from a working exploit where an unsigned poisoned tool steers the agent into posting an account reference to an in-pod collector, then you harden the registry boundary so the same class of attack cannot fire.

You apply the obvious fix first, a description blocklist, and watch a clean-description variant defeat it by hiding the abuse in the tool's delegate, which a prose scan never inspects. Then you build the durable control: a tool-supply-chain verifier that does manifest signature verification so an unsigned or forged tool is refused, hash pinning so a post-approval rug pull no longer matches its pin, a per-tool capability allow-list so a tool cannot delegate to a record it was never approved to read, and inter-agent message validation so a peer agent's free-text output is treated as data and not instructions. You finish by proving freshly planted unsigned, forged, mutated, and shadowing manifests are all refused while a legitimate signed and pinned tool stays admitted and usable.

Frequently asked questions

Do I need a machine-learning background?

No. The lab is about supply-chain trust, not model internals. You read a small ReAct agent and an MCP-style tool registry, see that tool metadata is trusted with no boundary, and build ordinary supply-chain controls: signature verification, hash pinning, a capability allow-list, and message validation.

What does the verifier actually check?

Four things, at the registry boundary, before any tool reaches the model. Provenance: each tool manifest must carry a signature that verifies under a trusted key. Integrity: each approved tool object is hash-pinned, so a silent post-approval mutation no longer matches its pin. Least privilege: a per-tool capability allow-list states which delegate and which records a tool may use. And inter-agent message validation reduces a peer agent's output to a structured schema so a downstream agent never executes prose.

Why is a description blocklist not enough?

A blocklist scans surface strings, so an attacker rephrases until nothing matches, or hides the abuse in the tool's capability (its delegate), which the scan never reads. The lab shows a clean-description tool whose delegate reads a cross-account record slipping past the blocklist. An allow model based on signature, pin, and capability closes that gap.

How is the hardening graded?

Behaviorally, on side effects, never on model wording. The grader plants a fresh unsigned poison, a forged-signature variant, a rug-pull mutation, and a shadowing duplicate in code, then confirms each is refused (the served tool stays the legit signed object and no account reference reaches the in-pod listener), that a benign entitlement question still resolves through the signed tool, and that an inter-agent worm's second hop is contained while benign emails still resolve.