Build a CI Red-Team Harness That Gates on Attack-Success-Rate
Build a tested CI red-team harness: it runs a battery of attacks across at least five classes against a target, scores each attempt with a deterministic oracle, triages true vs false positives, computes per-class and overall attack-success-rate, emits a machine-readable redteam_report.json, and exits non-zero when ASR crosses a threshold so the build fails. You prove it with tests and by showing the gate fire RED on a vulnerable build and GREEN on a hardened one. Submit the project as a .zip for instant, rubric-based feedback.
3 hrs
Est. time
4
Outcomes
6
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
The scenario
Manual testing does not scale and does not guard against regression. Your team has a small Retrieval-Augmented Generation (RAG) assistant in production and a habit of shipping prompt and retrieval changes weekly. Every release risks reopening a prompt-injection or data-leak hole someone already fixed, and nobody can prove a fix held until it breaks again in the wild.
Your lead wants the automated half of the red-team practice: a harness that runs a battery of attacks on every commit, computes attack-success-rate, triages the scanner's false positives, writes a report a dashboard can read, and fails CI when the rate crosses an agreed threshold. That harness, red on the vulnerable build and green on the hardened one, is this task.
Your role
You are an offensive security engineer standing up red-team automation for an LLM application. Your goal is a small, tested project: a harness that runs a multi-class attack battery, scores it against a deterministic oracle, triages true vs false positives, measures attack-success-rate, emits a machine-readable report, and acts as a CI gate that fails the build when the application regresses.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What this task is
This is a build-and-submit task, not a quiz about red-team tooling. You produce a tested harness project that runs a battery of attacks across at least five classes against an LLM application, scores each attempt with a deterministic oracle, triages true vs false positives, computes attack-success-rate, emits a machine-readable redteam_report.json, and acts as a CI gate that fails the build when attack-success-rate crosses a threshold. You prove it with tests and by showing the gate red on a vulnerable build and green on a hardened one.
The skill here is measurement and regression discipline. A scan that prints FAIL is not actionable until you can say how often the attack succeeded and how you know it succeeded. You anchor success to an application-level effect, an exfil-listener callback or a sentinel substring, and you drop the scanner's false positives (refusals, verbal play-along), so the number is the same on every run and your CI gate does not flake. This maps to OWASP LLM01 and LLM05, the OWASP Agentic Top 10 entries ASI01 and ASI06, and MITRE ATLAS AML.T0051.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (runnable harness, five-class battery, a tested harness whose ASR is correct and whose gate actually fires on regression, the machine-readable report, a deterministic oracle with triage, and the design rationale) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.