Build & submit taskBetaadvanced

Run a Full LLM and Agent VAPT Engagement, Then Ship the Remediations

The capstone of the AI Red Team path: run a complete vulnerability assessment and penetration test against ACME Cloud's deliberately-vulnerable agent stack (a multi-tenant RAG assistant, a tool-using ReAct agent, a two-agent graph, and an MCP tool server). Recon the stack, exploit at least five DISTINCT OWASP LLM and Agentic classes with working proofs-of-concept delivered indirectly through retrieved, tool, or inter-agent data, rate each finding's severity, ship remediations you re-test against your own exploits, and deliver a professional engagement report. Submit a single annotated file (your engagement.py plus report.md) for instant, rubric-based feedback.

5 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

An engagement is coverage plus depth

A real VAPT is not one clever exploit. It is several distinct bug classes, each proven, each rated, each remediated, and tied together by a report a platform team can ship from.

Indirect delivery is the whole threat model

Every required finding plants its payload in data the system trusts: a retrieved document, an ingested ticket, a tool result, a tool description, or an inter-agent handoff. The victim never types the attack.

Aligned models echo retrieved and tool data, not flagged secrets

The reliable path to impact is to abuse data the model already sees in context, framed as routine policy or ops, rather than to ask it to recite something it is trained to refuse.

Severity is impact times exploitability

Rate each finding on what it actually achieves. A zero-click cross-boundary exfiltration and a code-execution sink are top of the scale; a system-prompt leak on its own is not.

A remediation only counts when you re-test it

For each fix you re-run your own exploit and confirm it is blocked while a benign request still works. The correct fix is almost always server-side and bug-class specific, not a prompt patch.

The scenario

ACME Cloud has shipped an AI-powered support and operations platform and hired your firm for a full red-team engagement before it goes GA. The starter kit ships the whole stack so you can run the engagement offline: DV-RAG-Support, a multi-tenant Retrieval-Augmented Generation assistant that answers staff and customer questions; DV-ToolAgent-Mini, a ReAct agent whose output drives an HTTP fetch tool and a shell-backed provisioning helper; DV-ToolAgent, an operations agent with a write-capable SQL tool and durable memory; a two-agent intake-to-fulfillment graph; and a DV-MCP tool registry whose tool descriptions the agent trusts at connect time. The rules of engagement are the usual ones. You do not jailbreak the model or touch its weights, you do not social-engineer staff, and you stay inside the sandbox. Your foothold is the foothold a real attacker gets: a document the RAG indexes, a ticket the agent ingests, a tool result the agent trusts, a durable memory note, a tool description the registry serves, or an inbound email one agent summarizes for another.

Your lead wants a real engagement, not a list of one-off chatbot quirks. She wants reconnaissance that maps the attack surface, at least five DISTINCT OWASP LLM and Agentic classes proven with reproducible PoCs, a defensible severity rating per finding, remediations you re-tested as code against your own exploits without breaking the benign baseline, and a written report a platform team can act on. You run the whole engagement against the provided stack with engagement.py, confirm each PoC fires and each fix holds with the deterministic selfcheck.py, then zip the project and submit it here for the engagement grade.

Your role

You are the lead on an AI red-team engagement against ACME Cloud's deliberately-vulnerable agent stack. You exploit at least five distinct OWASP LLM and Agentic classes against the provided stack, each delivered indirectly through retrieved, ingested, tool, memory, or inter-agent data rather than a flagged system secret, and each confirmed by a deterministic side-channel. You deliver the engagement: the PoCs, each finding's severity, the OWASP and ATLAS mapping, and the remediations you implemented as tested code and proved hold against your own exploits.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP Top 10 for LLM Applications 2025

The LLM Top 10 taxonomy: LLM01 Prompt Injection, LLM02 Sensitive Information Disclosure, LLM04 Data and Model Poisoning, LLM05 Improper Output Handling, LLM06 Excessive Agency, LLM07 System Prompt Leakage, LLM08 Vector and Embedding Weaknesses.

genai.owasp.org

OWASP Top 10 for Agentic Applications

The Agentic Top 10: ASI02 Tool Misuse, ASI03 Identity and Privilege Abuse, ASI04 Agentic Supply Chain, ASI05 Unexpected Code Execution, ASI06 Memory and Context Poisoning, ASI07 Insecure Inter-Agent Communication, ASI08 Cascading Failures.

genai.owasp.org

MITRE ATLAS

The adversarial-ML knowledge base. Map findings to AML.T0051.001 Indirect Prompt Injection, AML.T0057 LLM Data Leakage, AML.T0070 RAG Poisoning, and the agent supply-chain techniques.

atlas.mitre.org

EchoLeak (CVE-2025-32711)

The real-world zero-click exfiltration the RAG markdown-image sink is modeled on, and the headline anchor for the engagement's executive summary.

securityweek.com

Morris II: the first GenAI worm

Self-replicating prompts that propagate across a multi-agent system, the pattern the two-agent inter-agent finding demonstrates at lab scale.

arxiv.org

What this task is

This is the capstone of the AI Red Team path: a build-and-submit engagement, not a quiz about LLM security. You run a full vulnerability assessment and penetration test against ACME Cloud's deliberately-vulnerable agent stack (a multi-tenant RAG assistant, a tool-using ReAct agent, a two-agent graph, and an MCP tool server), exploit at least five distinct OWASP LLM and Agentic classes with reproducible PoCs delivered indirectly through retrieved, tool, or inter-agent data, rate each finding's severity, ship remediations you re-test against your own exploits, and deliver a professional engagement report.

The classes you cover map across the OWASP Top 10 for LLM Applications (LLM01 Prompt Injection, LLM02 Sensitive Information Disclosure, LLM05 Improper Output Handling, LLM06 Excessive Agency, LLM08 Vector and Embedding Weaknesses), the OWASP Top 10 for Agentic Applications (ASI02 through ASI08), and MITRE ATLAS (indirect prompt injection, RAG poisoning, data leakage, and the agent supply-chain techniques), anchored by real incidents like EchoLeak and the Morris II GenAI worm. The skill this capstone proves is the one a hiring manager looks for: not one clever jailbreak, but a full engagement of distinct, proven, rated, and remediated findings tied together by a report a platform team can act on.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (runnable PoCs across classes, indirect delivery, concrete impact and coverage, severity and mapping, proven remediations, and report quality) with per-criterion feedback. You run and confirm the whole engagement offline with the starter kit's deterministic selfcheck before you submit; this task grades the engagement project you build. The pass threshold is 65 percent and you can resubmit. Questions about the task go to andrew@predica.ai.

Frequently asked questions

Do I need a paid API key?

No. The starter kit runs the whole engagement offline on the Python standard library alone, with deliberately naive stand-in models, so you can finish with no key and no lab pod. For the authentic difficulty (aligned models resist) you can point the components at any OpenAI-compatible endpoint. The bug classes live in the application and tool layer, so the offline kit still proves them. Say so if you stub the model.

What counts as five distinct classes?

Five different bug classes, not five phrasings of one. Good coverage spans indirect-injection exfiltration, retrieval or embedding poisoning, cross-tenant or sensitive-data leakage, a confused deputy or SSRF, a SQL or command or code-execution sink, memory poisoning that persists across sessions, and MCP tool-description poisoning. Each must be delivered indirectly and proven with evidence.

Why must payloads be delivered indirectly and never as a flagged secret?

Because that is the real threat model and the reliable one. An aligned model resists reciting something it is trained to refuse, but readily echoes data it already sees in retrieved documents, tool results, and memory, especially when the instruction is framed as routine policy or ops. Every required finding plants its payload in data the system trusts, so the victim never types the attack.

Why does the capstone require remediations and a report?

Because an engagement is only useful if it ships fixes. For each finding you apply the correct server-side, bug-class-specific remediation and re-run your own exploit against it to prove it is blocked while a benign request still works, then write it up with an executive summary and a per-finding section a platform team can act on. Contact andrew@predica.ai with questions.