Question 1

Why do I need both keyword checks and LLM-based safety checks?

Accepted Answer

Keyword checks run in microseconds and catch the obvious stuff — they're the cheap first line of defense. LLM-based checks understand intent, so they catch rephrased attacks ("please disregard the system prompt" has none of the usual trigger words but the intent is identical). In production you run keyword matching first as a fast rejection path and only pay for the LLM call when the message gets past it. That's the defense-in-depth pattern Step 6 composes.

Question 2

What's Colang and why does NeMo Guardrails use it?

Accepted Answer

Colang is NVIDIA's declarative language for conversation flows inside NeMo Guardrails. You define user intents (`define user express_greeting`), bot actions (`define bot greet`), and flows that connect them (`when user express_greeting → bot greet`). The rails engine uses an embedding-based intent matcher under the hood, so `"hi"`, `"hey there"`, and `"howdy"` all route to the same intent without you enumerating every variant. It's the mechanism behind topical rails in Step 5.

Question 3

What does the self_check_input rail actually do?

Accepted Answer

NeMo Guardrails ships a `self check input` output parser backed by a prompt template (defined in `prompts.yml`). When a user message arrives, the rails engine substitutes the message into `{{ user_input }}`, calls the LLM with the safety prompt, and checks whether the reply contains `is_content_safe: true`. If not, the message is blocked before it ever reaches the agent — you don't handle parsing or routing, just declare the rail in `config.yml`.

Question 4

How is a topical rail different from a safety rail?

Accepted Answer

A safety rail blocks harmful content (jailbreaks, prompt injection, profanity). A topical rail blocks out-of-scope content that is perfectly safe but shouldn't be answered — an IT bot shouldn't discuss recipes, a finance bot shouldn't give medical advice. Topical rails use Colang intent matching (`define user ask_about_cooking`) to route off-topic requests to a polite refusal. Step 5 implements exactly this pattern for the Acme IT bot.

Question 5

Do the guardrails sit inside the agent or outside it?

Accepted Answer

Outside, deliberately. The whole point is that an agent's LLM can be manipulated by the user — that's what prompt injection is — so the safety logic has to live in code or config the LLM can't rewrite. In Step 6's architecture the request flows `user → input guardrails → agent → output guardrails → user`. The agent never sees a blocked message, and a blocked response never reaches the user. This separation is what the NCP-AAI exam domain on Safety, Ethics, and Compliance is testing your mental model of.

Question 6

What proves the guarded agent actually works at the end?

Accepted Answer

Step 6 runs a red-team test matrix of benign IT questions plus a range of adversarial inputs — direct jailbreak attempts, rephrased versions, off-topic cooking questions, and clean password-reset requests. The check script asserts that benign inputs return a real answer and that every adversarial class is blocked either by the keyword rail, the LLM rail, or the topical rail. A passing Step 6 means your whole pipeline — not just one layer — holds up against the matrix.

Build NeMo Guardrails for an AI Agent: Jailbreak & Topical Rails

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll build in this agent safety lab

Frequently asked questions