Re-Scope a Tool Agent to Least Privilege with an Approval Gate
Build the defensive control for an over-privileged tool-using agent: a least-privilege tool policy, an authorization layer that carries the requesting user's identity to every tool call, and a human-in-the-loop approval gate for high-impact actions. Start from the provided DV-ToolAgent-shaped target and its working confused-deputy proof-of-concept, then build the control until the exploit is denied by authorization, an authorized in-scope action still succeeds, and a high-impact action routes through the approval gate. The exploit is only the pass/fail oracle here; the deliverable is the hardened control. Submit a single file for instant, rubric-based feedback.
3 hrs
Est. time
5
Outcomes
7
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
The scenario
You own the platform side of an internal tool-using agent (DV-ToolAgent-shaped: it reads support tickets, queries an accounts database, and fetches URLs). Today it runs with one shared service credential for every user, an over-broad database tool that can run any statement, and an unrestricted HTTP fetch tool. A red-teamer already filed the finding: an ingested ticket framed as a pre-approved finance correction drives a payee redirect through the write-capable tool, a confused deputy where a low-privilege caller performs an action they were never entitled to. You have their working proof-of-concept in hand.
Your job is not to attack the agent again. Your job is to make the attack stop working. You will build a least-privilege tool policy, an authorization layer that carries the requesting user's identity into every tool call, and an approval gate that pauses high-impact actions for a human. Then you re-run the red-teamer's exploit and show authorization denies it, run an authorized in-scope lookup and show it still succeeds, and run a legitimate high-impact write and show it routes through the gate instead of firing silently. That control, proven against the provided exploit, is this task.
Your role
You are a security engineer hardening an agentic application. Your goal is a single, self-contained file that builds a least-privilege tool policy plus an authorization layer plus a human-in-the-loop approval gate around a DV-ToolAgent-shaped target, then proves with printed output that the provided confused-deputy exploit is now denied by authorization, an authorized in-scope action still succeeds, and a high-impact action routes through the approval gate.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What this task is
This is a build-and-submit defensive-security task that asks you to ship a working control rather than answer a quiz about agent security. You build the control that stops an excessive-agency exploit: a least-privilege tool policy, an authorization layer that carries the requesting user's identity into every tool call, and a human-in-the-loop approval gate for high-impact actions, all applied to a DV-ToolAgent-shaped target. You start from the target and a provided confused-deputy proof-of-concept, then build the control until the exploit is denied by authorization, an authorized in-scope action still succeeds, and a high-impact action routes through the approval gate.
Excessive agency (OWASP LLM06, OWASP Agentic ASI03 Identity and Privilege Abuse, and the indirect prompt-injection foothold tracked in MITRE ATLAS) is the failure behind real agentic incidents: an over-broad tool, one shared credential, and no authorization carried from the requesting user to the action. The skill this task builds is the defender's counterpart to finding that bug. You design the least-privilege scopes, enforce them at the tool boundary so they hold even under a full model jailbreak, gate the highest-impact actions for a human, and prove with your own printed output that the exploit is dead and normal work still flows.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (the least-privilege policy and authorization layer built, the provided exploit denied by authorization, the approval gate, an authorized in-scope action preserved, the runnable target and provided exploit, the remediation rationale and mapping, and the minimality of the control) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.