Build & submit taskBetaadvanced

Exploit an Over-Privileged Agent, Then Re-Scope It to Least Privilege

Build a self-contained proof-of-concept that exploits a tool-using agent through excessive agency: deliver an authorized-looking foothold through an ingested ticket, drive a confused-deputy payee redirect a low-privilege caller is not entitled to, persist it by poisoning the agent's memory so it re-fires in a fresh session, show a benign baseline, then apply a least-privilege re-scope and re-run your own exploit to prove both paths are blocked while normal lookups still work. Write the finding the way a pentester would, with severity and an OWASP LLM06 / ASI02-ASI03 / ASI06 / MITRE ATLAS mapping. Submit a small project (or a single file) for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

Confused deputy means the system is broken even when the model behaves

The exploit succeeds against an aligned model and an ordinary prompt, because you abuse missing authorization and an over-broad tool, not a jailbreak.

Least privilege beats prompt-patching

An authorization wall at the tool holds even under a full jailbreak, because it never consults the model's decision.

Carry the user's identity to the tool

The root cause of the confused deputy is that no authorization flows from the requesting user to the downstream action.

A fix only counts when you re-run the exploit against it

Re-test the same payload after the re-scope and confirm a benign lookup still works.

The scenario

You are on a red-team engagement against an internal tool-using agent. It holds one shared service credential for all users and exposes an over-broad database tool plus an unrestricted HTTP fetch tool. The rules of engagement are simple: you may not jailbreak the model or touch its weights, and you may not social-engineer staff. You can file an inbound ticket the agent ingests, the same foothold an attacker gets from a support ticket, a wiki edit, or an email the agent later processes.

Your lead wants more than 'the agent did something odd.' She wants a reproducible proof-of-concept that drives a privileged action as a low-privilege user (the confused deputy), then a least-privilege re-scope that you prove kills your own exploit while normal lookups still work. That deliverable, an exploit plus a fix you re-test, is this task.

Your role

You are an offensive security engineer auditing an agentic application. Your goal is a small, self-contained project that proves a confused-deputy exploit and a memory-poisoning re-fire end to end, states the impact and severity like a professional finding, and demonstrates a least-privilege remediation that defeats your own proof-of-concept.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP LLM06:2025 Excessive Agency

The taxonomy entry covering excessive functionality, permissions, and autonomy, and the mitigations this task applies.

genai.owasp.org

OWASP Top 10 for Agentic Applications (ASI03, ASI02, ASI06)

The Agentic Top 10: ASI03 Identity and Privilege Abuse, ASI02 Tool Misuse, ASI06 Memory and Context Poisoning.

genai.owasp.org

MITRE ATLAS: AML.T0051 LLM Prompt Injection

The technique and its indirect sub-technique (AML.T0051.001), the foothold for this task.

atlas.mitre.org

MINJA: Memory Injection Attack (arXiv 2503.03704)

Poisoning agent memory using only normal query and observation turns, with progressive shortening for stealth.

arxiv.org

AgentPoison (arXiv 2407.12784)

Constrained trigger optimization reaching high attack-success-rate at a tiny poison rate with low benign degradation.

arxiv.org

What this task is

This is a build-and-submit offensive-security task, not a quiz about agent security. You produce a small project that proves an excessive-agency exploit end to end: a tool-using agent target, an indirect foothold delivered through an ingested ticket, a confused-deputy payee redirect a low-privilege caller is not authorized for, a memory-poisoning rule that re-fires the redirect in a fresh session, a benign baseline for contrast, and a least-privilege re-scope you re-run your own exploit against to prove it holds.

Excessive agency (OWASP LLM06, Agentic ASI02 Tool Misuse, ASI03 Identity and Privilege Abuse, ASI06 Memory and Context Poisoning, MITRE ATLAS AML.T0051) is the failure behind real agentic incidents: an over-broad tool, one shared identity, no authorization carried from the requesting user to the action, and a memory that persists a poisoned rule. The skill this task builds is the one that separates an AI red teamer from someone who can make a chatbot misbehave: drive a privileged action the caller was never entitled to, make it persist, then re-scope the system to least privilege and prove the fix ships.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (runnable PoC, confused-deputy exploit with memory persistence, indirect foothold, benign baseline, the tested least-privilege re-scope, the proven-dead re-test, and the written finding) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Do I need a paid API key?

No. The starter kit runs offline on the standard library with a deterministic too-trusting model, so you can finish with no key. You can also target any OpenAI-compatible endpoint, or simulate the model yourself and say so. The vulnerability is in the tool layer, so a stub still proves the confused deputy. The rubric rewards the exploit and the re-scope, not which model you used.

Why a payee redirect instead of making the agent grant admin?

An aligned model resists an overtly privileged self-serving action, but complies with a routine-looking record correction it believes is authorized. A payee redirect framed as a pre-approved finance correction fires reliably and has identical red-team impact: an unauthorized financial redirect through a tool the agent already believes it may use.

What is the memory-poisoning part?

The same ticket asks the agent to remember a durable routing rule. The agent stores it with no provenance check, and recalls it as a trusted preference in later sessions. In a brand-new session a different legitimate user asks an ordinary invoice question, the agent applies the planted rule, and the redirect re-fires with no new injection. Persistence across the session boundary is the win.

What counts as a least-privilege re-scope?

A server-side boundary, not a prompt. Carry the requesting user's SESSION identity to the tool and reject actions they are not authorized for (never trust a caller-supplied identity claim), and scan memory writes so a durable action-directive is never persisted. The wall must hold even if the model is fully jailbroken, and a legitimate in-scope correction and a benign lookup must still work.