Build a Measured Control-Coverage Checker for an LLM App
Build a tested control-coverage checker: for each in-scope OWASP LLM / Agentic attack class on a target with four control points (input mediation, retrieval and context controls, output mediation, action authorization), it runs a representative probe, confirms the matching control is present AND actually blocks the probe (verified, not asserted), and emits a coverage report marking each class COVERED or a GAP, exiting non-zero if any in-scope class is an uncovered gap. The target ships with one control deliberately disabled (a planted gap) so the checker has a real gap to catch. You prove it with tests and by showing the checker flag the gap and exit non-zero, then report full coverage and exit zero once the control is added. Submit the project as a .zip for instant, rubric-based feedback.
3 hrs
Est. time
4
Outcomes
7
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
The scenario
You own the platform side of an internal LLM application. It reads user input, retrieves context from a knowledge store, calls a model, and lets the model trigger actions through tools. There is a control at each place trust changes hands: input mediation, retrieval and context controls, output mediation, and action authorization. Each control is supposed to stop one in-scope attack class your offensive teammates already proved: an indirect instruction at input, a cross-tenant retrieval, a markdown-image exfil at output, and an out-of-scope tool call at action.
Your lead does not want a written claim that the layers are guarded. She wants evidence: a coverage checker that, for each in-scope class, runs a representative probe against the live target and confirms the matching control actually blocks it, not merely that someone wired a function in. The deliverable is a control-coverage checker that marks each class COVERED only when its probe is verifiably blocked, flags any uncovered class as a GAP, and exits non-zero so a regression fails the build. The target ships with one control deliberately disabled, so your checker has a real gap to catch. Coverage is a claim you have to prove.
Your role
You are a security engineer building the measurement half of a secure-by-design practice for an LLM application with four control points: input mediation, retrieval and context controls, output mediation, and action authorization. Your goal is a small, tested project: a coverage checker that, for each in-scope OWASP LLM / Agentic attack class, runs a representative probe against the target, marks the class COVERED only when the probe is verifiably blocked, flags uncovered classes as GAPs, emits a machine-readable coverage report, and acts as a gate that fails the build when any in-scope class is an uncovered gap.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What this task is
This is a build-and-submit defensive-security task, not a quiz about guardrails. You produce a tested coverage checker project that, for each in-scope OWASP LLM / Agentic attack class on a four-control-point LLM application (input mediation, retrieval and context controls, output mediation, action authorization), runs a representative probe against the live target, marks the class COVERED only when the probe is verifiably blocked, flags any uncovered class as a GAP, emits a machine-readable coverage report, and exits non-zero if any in-scope class is an uncovered gap. The target ships with one control deliberately disabled, so your checker has a real gap to catch.
The skill here is the difference between a guard and a control: verified, not asserted. A static coverage map that lists every layer as covered reads the same whether or not a control is wired in. A coverage checker that runs each probe against the live target is sensitive to the real control state, so it catches the disabled control the map would have missed. You anchor each probe to a deterministic application effect (the account sentinel left through a loaded image URL, a cross-tenant sentinel in the answer, a tool that actually executed), never an LLM judge, so the coverage verdict is identical on every run. This maps to OWASP LLM01, LLM02, LLM05, and LLM06, the OWASP Agentic Top 10, and MITRE ATLAS AML.T0051.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (a runnable checker, the four in-scope classes mapped to their control points, a tested checker that verifies each class by running its probe and correctly flags the planted gap with a gate that actually fires, the benign request passing clean, the machine-readable report, the coverage matrix and methodology write-up, and the standards mapping) with per-criterion feedback. The pass threshold is 65 percent and you can resubmit.