Build & submit taskBetaadvanced

Re-Scope a Tool Agent to Least Privilege with an Approval Gate

Build the defensive control for an over-privileged tool-using agent: a least-privilege tool policy, an authorization layer that carries the requesting user's identity to every tool call, and a human-in-the-loop approval gate for high-impact actions. Start from the provided DV-ToolAgent-shaped target and its working confused-deputy proof-of-concept, then build the control until the exploit is denied by authorization, an authorized in-scope action still succeeds, and a high-impact action routes through the approval gate. The exploit is only the pass/fail oracle here; the deliverable is the hardened control. Submit a single file for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

The deliverable is the control, the exploit is the oracle

You are not hunting a bug here. You are building a least-privilege policy, an authorization layer, and an approval gate, and you use the provided exploit only to prove the wall holds and benign work still flows.

Carry the caller's identity all the way to the tool

The root cause of the confused deputy is that no authorization flows from the requesting user to the downstream action. The fix is to bind the caller's identity to every tool call and check the required scope at the tool boundary.

Authorization beats prompt-patching

An authorization wall at the tool holds even under a full jailbreak because it never consults the model's decision. A system-prompt rule that says please do not do writes is not a control.

Least privilege plus an approval gate is defense in depth

Scoping tools to the minimum reduces blast radius, per-user authorization stops the confused deputy, and a human-in-the-loop gate on high-impact actions adds oversight where the cost of a mistake is highest. The layers cover each other.

A control only counts when you prove it does not over-block

Re-run the exploit to show it is denied, then run an authorized in-scope action to show it still succeeds, and route a legitimate high-impact action through the gate. Security that breaks normal work gets turned off.

The scenario

You own the platform side of an internal tool-using agent (DV-ToolAgent-shaped: it reads support tickets, queries an accounts database, and fetches URLs). Today it runs with one shared service credential for every user, an over-broad database tool that can run any statement, and an unrestricted HTTP fetch tool. A red-teamer already filed the finding: an ingested ticket framed as a pre-approved finance correction drives a payee redirect through the write-capable tool, a confused deputy where a low-privilege caller performs an action they were never entitled to. You have their working proof-of-concept in hand.

Your job is not to attack the agent again. Your job is to make the attack stop working. You will build a least-privilege tool policy, an authorization layer that carries the requesting user's identity into every tool call, and an approval gate that pauses high-impact actions for a human. Then you re-run the red-teamer's exploit and show authorization denies it, run an authorized in-scope lookup and show it still succeeds, and run a legitimate high-impact write and show it routes through the gate instead of firing silently. That control, proven against the provided exploit, is this task.

Your role

You are a security engineer hardening an agentic application. Your goal is a single, self-contained file that builds a least-privilege tool policy plus an authorization layer plus a human-in-the-loop approval gate around a DV-ToolAgent-shaped target, then proves with printed output that the provided confused-deputy exploit is now denied by authorization, an authorized in-scope action still succeeds, and a high-impact action routes through the approval gate.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP LLM06:2025 Excessive Agency

The taxonomy entry covering excessive functionality, permissions, and autonomy, and the least-privilege, authorization, and human-in-the-loop mitigations this task builds.

genai.owasp.org

OWASP Top 10 for Agentic Applications (ASI03, ASI02)

The Agentic Top 10: ASI03 Identity and Privilege Abuse and ASI02 Tool Misuse, the classes this control closes.

genai.owasp.org

MITRE ATLAS: AML.T0053 LLM Plugin Compromise

Browse the ATLAS technique catalog to map the over-broad tool and confused-deputy action to the relevant execution technique, and verify the current technique IDs.

atlas.mitre.org

MITRE ATLAS: AML.T0051 LLM Prompt Injection

The indirect prompt-injection technique behind the ingested-ticket foothold your control is defending against.

atlas.mitre.org

NIST SP 800-53 AC-6 Least Privilege

The control catalog definition of least privilege, the principle your tool policy and authorization layer implement.

csrc.nist.gov

What this task is

This is a build-and-submit defensive-security task that asks you to ship a working control rather than answer a quiz about agent security. You build the control that stops an excessive-agency exploit: a least-privilege tool policy, an authorization layer that carries the requesting user's identity into every tool call, and a human-in-the-loop approval gate for high-impact actions, all applied to a DV-ToolAgent-shaped target. You start from the target and a provided confused-deputy proof-of-concept, then build the control until the exploit is denied by authorization, an authorized in-scope action still succeeds, and a high-impact action routes through the approval gate.

Excessive agency (OWASP LLM06, OWASP Agentic ASI03 Identity and Privilege Abuse, and the indirect prompt-injection foothold tracked in MITRE ATLAS) is the failure behind real agentic incidents: an over-broad tool, one shared credential, and no authorization carried from the requesting user to the action. The skill this task builds is the defender's counterpart to finding that bug. You design the least-privilege scopes, enforce them at the tool boundary so they hold even under a full model jailbreak, gate the highest-impact actions for a human, and prove with your own printed output that the exploit is dead and normal work still flows.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (the least-privilege policy and authorization layer built, the provided exploit denied by authorization, the approval gate, an authorized in-scope action preserved, the runnable target and provided exploit, the remediation rationale and mapping, and the minimality of the control) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Do I need a paid API key?

No. You can target any OpenAI-compatible tool-calling endpoint you have access to, or simulate a too-trusting model deterministically and say so. The vulnerability and the control both live in the tool layer, so a stub still proves the confused deputy and the fix. The rubric rewards the control you build and the proof it holds, independent of which model you used.

What exactly is the deliverable?

The defensive control itself. You build three things: a least-privilege tool policy that declares the minimum scope each tool needs and grants each user only the scopes their role allows, an authorization layer that carries the caller's identity into every tool call and denies any call whose required scope the caller lacks, and a human-in-the-loop approval gate that holds high-impact actions pending an explicit approval token. The provided exploit is only the pass/fail oracle.

What counts as the control being proven?

Three printed outcomes. The provided confused-deputy exploit is re-run against your hardened agent and denied by the authorization layer before any side effect. An authorized in-scope action by an entitled user still succeeds and returns a real result, proving no over-blocking. A legitimate high-impact action routes through the approval gate, held pending until a valid token is supplied.

Why must the control sit at the tool boundary instead of in the system prompt?

Because a system-prompt rule that asks the model not to perform writes is not a control. It can be talked out of with a plausible, authorized-looking ticket. An authorization wall at the tool carries the requesting user's identity and checks the required scope before any side effect, so it holds even if the model is fully jailbroken. That is what makes least privilege plus an approval gate durable defense in depth.