Build & submit taskBetaadvanced

Audit an Agent's Tool Supply Chain and Multi-Agent Graph

Build a self-contained proof-of-concept that exploits an agent supply chain end to end: poison or shadow a tool the agent loads from a registry, propagate an injection across a two-agent handoff, show the benign baseline, then ship a hardened manifest (pinned and scanned tool definitions, namespaced tools, a schema-constrained inter-agent channel) and re-run both exploits to prove they are blocked without breaking legitimate flows. Submit a single script or notebook for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

Tool metadata is trusted instruction text

An agent pastes each tool's name, description, and schema into the model's context. A directive hidden there steers the model with no jailbreak.

Approval is a snapshot

A tool reviewed at connect time can be mutated at runtime. Pinning and hashing approved definitions is what makes approval an invariant.

Names collide, models pick the better description

A flat namespace plus last-write-wins lets a twin shadow a trusted tool; namespacing and collision detection close it.

Peer messages are data, not commands

A second agent that executes an upstream agent's free-text notes will propagate an injection. A schema-constrained, authenticated channel contains it.

Prove the second hop

Attribution matters: a side channel tagged by the host (not the payload) proves which agent fired, so you measure real propagation.

The scenario

You're on a red-team engagement against an internal agent platform: an IT-support assistant that loads its tools at runtime from an MCP-style registry, plus a two-agent graph where an intake agent hands work orders to a resolver agent. The rules of engagement are simple. You cannot touch the model weights and you cannot social-engineer employees. You can register a tool into the marketplace and you can file an inbound customer email the intake agent ingests, the same footholds an attacker gets from a public plugin catalog and a support inbox.

Your lead wants more than 'a tool description made the model do something.' She wants a reproducible proof-of-concept across the supply chain and the agent graph, a clear statement of impact, and a hardened manifest the platform team can ship and that you have proven defeats your own exploits. That deliverable, two exploits plus a fix you re-test, is this task.

Your role

You are an offensive security engineer auditing an agentic platform's tool supply chain and multi-agent message bus. Your goal is a single, self-contained file that proves a supply-chain exploit and an inter-agent propagation exploit end to end, states impact and severity like a professional finding, and demonstrates a remediation that defeats both of your proofs-of-concept without breaking legitimate tools or flows.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP Agentic AI Top 10 (ASI)

The Agentic Security Initiative threat list covering tool misuse, peer-agent trust, cascades, and replication.

genai.owasp.org

OWASP LLM01: Prompt Injection

The metadata-borne foothold for tool poisoning and inter-agent injection.

genai.owasp.org

MITRE ATLAS

Re-verify the Publish Poisoned AI Agent Tool technique and the Morris II case study ids before citing.

atlas.mitre.org

Morris II / ComPromptMized

The self-replicating GenAI worm this propagation finding is modeled on (Cohen et al., USENIX Security 2024).

google.com

Flowise CVE-2025-59528

RCE via unsafe CustomMCP config handling: the host implementation is itself a sink.

nvd.nist.gov

What this task is

This is a build-and-submit offensive-security task, not a quiz about agent security. You produce a single file that proves two exploits end to end against an agentic supply chain: a tool-supply-chain attack (description poisoning, shadowing, or a rug pull) and an inter-agent propagation attack (a payload that enters one agent and executes in a second). You show the benign baseline, ship a hardened manifest with a schema-constrained inter-agent channel, and re-run both exploits to prove the fix holds without breaking legitimate flows.

Agentic supply-chain and multi-agent attacks (OWASP Agentic ASI02/04/07/08/10, LLM01/06, MITRE ATLAS Publish Poisoned AI Agent Tool and the Morris II case study) are the mechanism behind real incidents like the Flowise CustomMCP RCE and the Morris II self-replicating worm. The skill this task builds is the one a real AI red teamer needs for agent platforms: poison or shadow a tool, propagate an injection across an agent boundary, attribute the second hop deterministically, and ship a fix you have proven defeats your own proof-of-concept.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (runnable harness, supply-chain exploit, inter-agent propagation, benign baseline, hardened manifest, contained re-run, and the written finding) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Do I need a paid API key?

No. You can target any hosted model you have access to, or simulate a naive, context-obeying model deterministically and say so. The rubric rewards the exploits and the hardened re-run, not which model you used.

What counts as a supply-chain exploit here?

Tool-description poisoning, tool shadowing or a name collision/typosquat, or a rug pull (mutating a tool definition after it was approved). Any one of these, demonstrated with a concrete impact, satisfies the criterion.

How do I prove the second hop and not just the first?

Plant the payload only in the first agent's input, and tag each agent's side effects with its agent id in the host, not in the payload text. A side effect attributed to the second agent proves the injection propagated across the handoff.

Why does the task require a hardened re-run?

Because a finding is only useful if it ships a fix. You apply a hardened manifest and a constrained inter-agent channel, then re-run both exploits to prove they are blocked while a benign request still resolves, the same standard a real engagement holds you to.