Build & submit taskBetaadvanced

Build an MCP Manifest Verifier with Signing, Pinning, and an Allow-List

Build the defensive control for an agentic tool supply chain: a manifest verifier that checks an HMAC/hashlib signature on every tool definition, pins each approved tool to a content hash so a post-approval mutation is detected, and enforces a capability allow-list so unknown or shadowing tools never load. Start from a vulnerable MCP-style registry and three working PoCs (tool poisoning, a rug pull, and tool shadowing). Wire your verifier in front of the registry, re-run the PoCs to prove all three are now rejected, run a benign control to prove a legitimate signed and pinned tool is still accepted and usable, and write a short remediation rationale with the standards mapping. Submit a single script or notebook for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

The control is the deliverable, the exploit is the oracle

A defensive build is judged by whether the attack is blocked and benign use still works. The provided PoCs are the pass/fail test, so the verifier, not the attack, is where the work goes.

Tool definitions are signed artifacts, not trusted text

An agent pastes each tool's name, description, and schema into the model's context. Verifying an hmac signature over the canonical definition makes a forged or edited definition fail before it ever reaches the model.

Approval is a snapshot, a pin makes it an invariant

A tool reviewed at connect time can be mutated at runtime. Pinning the approved content hash and rejecting any drift is what catches a rug pull deterministically.

An allow-list closes the open namespace

A flat namespace with last-write-wins lets a twin shadow a trusted tool. An allow-list keyed on the approved name and capability rejects the collision regardless of which description is more persuasive.

Defense in depth means independent layers

Signature, pin, and allow-list each fail an attacker on a different axis: forgery, mutation, and identity. One bypass does not collapse the others, which is what makes the control durable.

The scenario

You own platform security for an internal agent that loads its tools at runtime from an MCP-style registry: a list of tool definitions, each with a name, a description, and an input schema, that the agent pastes into the model's context and then calls. The offensive team already proved it is wide open. They handed you a tiny vulnerable registry and three working proofs-of-concept: a poisoned tool description that steers the agent, a rug pull that mutates an approved tool definition after review, and a shadowing tool that registers a colliding name to override a trusted one. The exploits are not your deliverable. They are the oracle that tells you whether your fix actually holds.

Your job is to build the control the offensive findings recommended: a verifier that sits between the registry and the agent. Nothing loads unless its definition carries a valid signature, matches the hash it was pinned to at approval time, and names a capability on the allow-list. You will run the three PoCs through the verifier and show each one rejected in the printed output, then run a benign signed and pinned tool through the same verifier and show it accepted and callable. The center of gravity of this task is the verifier, not the attack.

Your role

You are a platform security engineer hardening an agentic tool supply chain. Your goal is a single, self-contained file that builds the defensive control end to end: a manifest verifier (signature verification + hash pinning + a capability allow-list) wired in front of a vulnerable MCP-style registry, with the provided tool-poisoning, rug-pull, and tool-shadowing proofs-of-concept all shown rejected after your fix, and a legitimate signed and pinned tool still accepted and usable with no over-blocking.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP Agentic AI Top 10 (ASI)

The Agentic Security Initiative threat list covering tool misuse, supply-chain integrity, and tool poisoning. Verify the current ASI identifiers before citing.

genai.owasp.org

OWASP LLM03:2025 Supply Chain

The supply-chain risk class this control mitigates: untrusted third-party components and tool definitions.

genai.owasp.org

OWASP LLM01:2025 Prompt Injection

Why a poisoned tool description is a prompt-injection vector that signing and pinning the definition help contain.

genai.owasp.org

MITRE ATLAS

Map the threat to the Publish Poisoned AI Agent Tool technique and supply-chain tactics. Verify the current technique IDs before citing.

atlas.mitre.org

NIST SP 800-218 (SSDF)

Secure software development practices for verifying integrity and provenance of components, the general pattern behind signing and pinning.

csrc.nist.gov

What this task is

This is a build-and-submit defensive-security task, not a quiz about supply-chain security. You build the control: a manifest verifier that sits in front of an MCP-style tool registry and gates every tool definition on three independent checks, an hmac/hashlib signature over the canonical definition, a content-hash pin that detects any post-approval mutation, and a capability allow-list that rejects unknown or colliding tool names. The provided tool-poisoning, rug-pull, and tool-shadowing proofs-of-concept are the oracle: you re-run them through your verifier and show each one rejected, then run a benign signed and pinned tool and show it accepted and usable.

Agentic supply-chain attacks (OWASP LLM03:2025 Supply Chain, OWASP Agentic ASI tool-poisoning entries, and the MITRE ATLAS Publish Poisoned AI Agent Tool technique) are the mechanism behind real incidents in tool registries and plugin catalogs. The skill this task builds is the defensive counterpart to those exploits: treat a tool definition as a signed, pinned artifact, enforce an allow-list at the registry boundary, and prove with the attacker's own proofs-of-concept that the control holds while a legitimate tool still loads and runs.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (the verifier control built, all three PoCs rejected, the benign tool preserved, the control minimal and at the boundary, the remediation rationale and mapping, and a self-contained run) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Do I need a paid API key?

No. The agent can be a deterministic local stub, or any model you can reach. The graded deliverable is the verifier and its behavior against the three proofs-of-concept, not which model you used.

What exactly am I building?

One control: a manifest verifier that, for every tool definition, verifies an hmac signature over its canonical bytes, compares its current content hash against a pinned approved hash to catch a rug pull, and checks its name and capability against an allow-list to block shadowing. Nothing loads unless all three checks pass.

How is the control graded?

By the oracle. You re-run the provided poisoning, rug-pull, and shadowing proofs-of-concept through your verifier and show each rejected with its reason in the printed output, then run a legitimately signed and pinned tool and show it accepted, loaded, and called to a normal result, proving you did not over-block.

Why signature and pin and allow-list rather than one of them?

Because they are independent layers and that is what makes the control durable. A signature stops a forged definition, a pinned hash stops a post-approval mutation, and an allow-list stops a colliding name or an un-approved capability. Bypassing one does not collapse the others, which is the defense-in-depth standard a remediation is held to.