Defend Excessive Agency: Re-scope a Tool Agent to Least Privilege (AuthZ + Human Approval Gate)
Harden DV-ToolAgent, a real tool-using ReAct agent, against the confused-deputy and scope-escalation attacks from the offensive lab, in small sequential steps. Stand up the agent and trace one benign ticket, then reproduce the handed-to-you exploit one surface at a time: an ingested ticket that makes the agent redirect a billing payee under its own shared credential and read across tenants, then an SSRF reach into an internal-only endpoint and a poisoned-memory replant. Watch a naive SQL denylist get bypassed by a case-folded variant, then build the durable control one mechanism per step: least-privilege tool scope (the ingest role holds no write scope), per-argument authorization decided on the session identity (a caller-supplied identity claim is ignored), and a human-in-the-loop approval gate that holds high-impact writes pending an explicit token. Verify the exploit is dead with authorized in-scope work intact, then prove obfuscated, renamed, and spoofed variants are all blocked.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Stand up DV-ToolAgent and trace one benign ticketYou are the defender on DV-ToolAgent, ACME Cloud's internal operations assistant.
- 2Reproduce attack A: the confused-deputy write firesBefore you defend anything, reproduce the attack so you can see exactly what is open.
- 3Reproduce attack B: SSRF reach and a poisoned-memory replantThe same over-privileged agent has two more surfaces in the same excessive-agency
- 4Watch a naive SQL denylist get bypassedThe obvious reaction to the confused-deputy write is to block the dangerous word:
- 5Control mechanism 1: least-privilege tool scopeTime to build the durable control. It has three mechanisms, one per step, and you build
- 6Control mechanism 2: server-side authorization on the session identityMechanism 1 scoped each role to a set of tools and verbs. That stopped the ingest
- 7Control mechanism 3: a human approval gate for high-impact writesMechanisms 1 and 2 stopped the low-privilege caller: ticket-bot cannot run a write and
- 8Verify: the exploit is blocked, benign in-scope work intactYou built the control over three mechanisms: least privilege (Step 5), per-arg
- 9Resist bypass: obfuscated, renamed, and spoofed attacks all blockedA control that only stops the one payload you tested is the denylist mistake all over
Prerequisites
- Comfortable reading and editing Python
- Know what a SQL UPDATE, an HTTP GET, and an allow-list are
- Helpful (not required): the offensive Tool-Scope Escalation lab
Exam domains covered
Skills & technologies you'll practice
This advanced-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on defensive-security lab built on a real tool-using agent: a ReAct loop with native tool-calling against an in-cluster model, a write-capable SQLite tool, and an HTTP fetch tool with no allow-list. You are the defender. The red team handed you a working exploit against DV-ToolAgent, ACME Cloud's internal operations assistant: as ticket-bot, the low-privilege ingest account, an ingested ticket makes the agent redirect a billing payee under its own shared credential. You reproduce that confused deputy (OWASP LLM06 Excessive Agency, Agentic ASI03), then watch an obvious SQL-keyword denylist get bypassed by a case-folded and comment-obfuscated variant, learning why shallow filters fail.
You then build the durable control at the tool boundary, server-side, so it holds regardless of what the model decides: a minimal tool policy that allow-lists the exact tool actions ticket-bot may take, a per-argument authorization check that rejects an out-of-scope write and a cross-tenant read, a human-in-the-loop approval gate that routes a high-impact action to a pending queue instead of auto-executing it (Agentic ASI03 and ASI05), and memory integrity that quarantines recalled notes as data and namespaces them per user (Agentic ASI06). You verify against freshly planted payloads each run: the confused-deputy write is rejected, the SSRF host is denied, the high-impact action lands in the approval queue rather than firing, the poisoned memory cannot re-fire, and an authorized in-scope action still succeeds.