Build & submit taskBetaadvanced

Build a Measured Control-Coverage Checker for an LLM App

Build a tested control-coverage checker: for each in-scope OWASP LLM / Agentic attack class on a target with four control points (input mediation, retrieval and context controls, output mediation, action authorization), it runs a representative probe, confirms the matching control is present AND actually blocks the probe (verified, not asserted), and emits a coverage report marking each class COVERED or a GAP, exiting non-zero if any in-scope class is an uncovered gap. The target ships with one control deliberately disabled (a planted gap) so the checker has a real gap to catch. You prove it with tests and by showing the checker flag the gap and exit non-zero, then report full coverage and exit zero once the control is added. Submit the project as a .zip for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

Coverage is a claim you have to prove

Saying a layer is guarded is not the same as proving the guard is reached and effective. A coverage checker that re-runs the matching attack against the live target and fails loudly when a control is missing is what turns a guard into a control.

Verified beats asserted

A static coverage map that lists every control as COVERED reads the same whether or not a control is actually wired in. A checker that runs each probe is sensitive to the real control state, so it catches the disabled control the map would have missed.

No over-blocking is half the job

A control that breaks normal use gets disabled in production. The benign request passing clean through all four controls is as load-bearing as the four blocks, so the checker must fail on over-blocking too. Coverage is measured on both axes.

Sink-side primitives beat model-side filters

Each control should use the right primitive for its layer: provenance and quarantine at input, a tenant-scoped query at retrieval, a safe-markdown subset plus egress allow-list at output, and a tool-and-argument allow-list at action. These hold when a model is talked into misbehaving; a classifier does not.

The scenario

You own the platform side of an internal LLM application. It reads user input, retrieves context from a knowledge store, calls a model, and lets the model trigger actions through tools. There is a control at each place trust changes hands: input mediation, retrieval and context controls, output mediation, and action authorization. Each control is supposed to stop one in-scope attack class your offensive teammates already proved: an indirect instruction at input, a cross-tenant retrieval, a markdown-image exfil at output, and an out-of-scope tool call at action.

Your lead does not want a written claim that the layers are guarded. She wants evidence: a coverage checker that, for each in-scope class, runs a representative probe against the live target and confirms the matching control actually blocks it, not merely that someone wired a function in. The deliverable is a control-coverage checker that marks each class COVERED only when its probe is verifiably blocked, flags any uncovered class as a GAP, and exits non-zero so a regression fails the build. The target ships with one control deliberately disabled, so your checker has a real gap to catch. Coverage is a claim you have to prove.

Your role

You are a security engineer building the measurement half of a secure-by-design practice for an LLM application with four control points: input mediation, retrieval and context controls, output mediation, and action authorization. Your goal is a small, tested project: a coverage checker that, for each in-scope OWASP LLM / Agentic attack class, runs a representative probe against the target, marks the class COVERED only when the probe is verifiably blocked, flags uncovered classes as GAPs, emits a machine-readable coverage report, and acts as a gate that fails the build when any in-scope class is an uncovered gap.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

OWASP Top 10 for LLM Applications (2025)

The taxonomy to map your four in-scope classes to control points (LLM01 injection at input, LLM02 sensitive information disclosure at retrieval, LLM05 output handling at output, LLM06 excessive agency at action).

genai.owasp.org

OWASP Agentic Security Initiative (ASI)

Agentic threats and mitigations to map the retrieval and action-authorization control points against.

genai.owasp.org

MITRE ATLAS

Map each attack class and its mitigation to the relevant tactics and techniques (verify the current IDs).

atlas.mitre.org

NIST AI Risk Management Framework

Defense-in-depth and trust-boundary framing for the remediation rationale.

nist.gov

OWASP LLM05:2025 Improper Output Handling

Primary reference for the output-mediation control point: encode, allow-list, constrain egress.

genai.owasp.org

What this task is

This is a build-and-submit defensive-security task, not a quiz about guardrails. You produce a tested coverage checker project that, for each in-scope OWASP LLM / Agentic attack class on a four-control-point LLM application (input mediation, retrieval and context controls, output mediation, action authorization), runs a representative probe against the live target, marks the class COVERED only when the probe is verifiably blocked, flags any uncovered class as a GAP, emits a machine-readable coverage report, and exits non-zero if any in-scope class is an uncovered gap. The target ships with one control deliberately disabled, so your checker has a real gap to catch.

The skill here is the difference between a guard and a control: verified, not asserted. A static coverage map that lists every layer as covered reads the same whether or not a control is wired in. A coverage checker that runs each probe against the live target is sensitive to the real control state, so it catches the disabled control the map would have missed. You anchor each probe to a deterministic application effect (the account sentinel left through a loaded image URL, a cross-tenant sentinel in the answer, a tool that actually executed), never an LLM judge, so the coverage verdict is identical on every run. This maps to OWASP LLM01, LLM02, LLM05, and LLM06, the OWASP Agentic Top 10, and MITRE ATLAS AML.T0051.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (a runnable checker, the four in-scope classes mapped to their control points, a tested checker that verifies each class by running its probe and correctly flags the planted gap with a gate that actually fires, the benign request passing clean, the machine-readable report, the coverage matrix and methodology write-up, and the standards mapping) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Do I need a paid API key?

No. The starter kit ships a deterministic stub target and runs offline on the standard library alone, so the coverage verdict is repeatable with no key. You can also point it at any OpenAI-compatible endpoint. The rubric rewards the verified coverage checker, the caught gap, and the benign pass, not which model you used.

What are the four control points?

Input mediation (quarantine instruction-shaped text in untrusted content), retrieval and context controls (a query scoped to the caller tenant or provenance), output mediation (a safe-markdown subset plus an egress host allow-list), and action authorization (a tool-and-argument allow-list with validation). Each owns one place trust changes hands, and one in-scope attack class is aimed at each.

What does it mean to verify coverage instead of asserting it?

A checker that trusts a written coverage table reports a layer COVERED even when its control is disabled. Verifying means running the representative attack against the live target for each in-scope class and marking it COVERED only when the deterministic application effect (an exfil URL loaded, a cross-tenant record leaked, a tool executed) did not occur. That is what catches the planted gap the kit ships.

Why is this the defense counterpart to an attack task?

The offensive tasks lead with the exploit; here the probe battery is only the oracle that tells you whether each control is real. The center of gravity is the measurement: a checker that maps every in-scope class to a control point, proves coverage by running the probe rather than trusting a claim, flags the gap, and gates the build. That is the secure-by-design coverage discipline the module teaches.