Question 1

What's the difference between `create_agent` and building the loop yourself in LangGraph?

Accepted Answer

`create_agent` gives you a prebuilt ReAct graph — nodes, edges, routing all wired up — for the common case. LangGraph's `StateGraph` is the underlying primitive: you define the state schema, nodes, conditional edges, and you can add things `create_agent` doesn't expose, like tool-call budgets, custom routing to specialist sub-agents, side-effect nodes (logging, metrics), or early-exit conditions based on state. Step 5 of this lab uses `StateGraph` to cap tool calls at 4 per run — something you can't do with `create_agent` today. Production agents almost always end up on `StateGraph` eventually; `create_agent` is for prototypes.

Question 2

Why do I need `bind_tools` if `create_agent` exists?

Accepted Answer

You don't need to call `bind_tools` directly when you use `create_agent` — it does that internally. Step 3 of this lab walks through `bind_tools` manually so you understand what `create_agent` is doing under the hood. Specifically: `bind_tools(tools)` attaches each tool's JSON schema to the LLM client, so when you invoke it, the model can respond with a structured `tool_calls` list (name + args + id) instead of free-form text. Your code dispatches each call, wraps the result in a `ToolMessage` with the matching `tool_call_id`, and feeds the updated conversation back in. That's the entire ReAct loop in manual form.

Question 3

Why does YAML (NeMo Agent Toolkit) matter if I already have Python code that works?

Accepted Answer

The real production value isn't that YAML is easier to write — it's that YAML is easier to *change*. Swap the LLM from a 120B model to an 8B model by editing one line and rerunning `nat run`; no code deploy. Hot-swap a tool implementation for a stubbed version during offline evals. Version the exact config that produced last month's eval scores and reproduce them byte-for-byte. Python gives you flexibility; YAML gives you reproducibility and ops-friendliness. Mature teams use both — Python for novel logic, YAML for deployment-time configuration.

Question 4

Why would I ever need a tool-call budget in a production agent?

Accepted Answer

Because LLMs don't always know when to stop. An agent can genuinely get stuck in a loop — call search, reason, call search again with a slightly different query, reason, ad infinitum — especially when the correct answer isn't in the tool's corpus. Each loop iteration eats API quota, delays the user's response, and in the worst case exhausts your NIM rate limit for the whole fleet. A hard cap at 4-6 tool calls per run is the cheapest production guard. Step 5 of this lab wires it up as a LangGraph state counter + conditional edge, so the graph forcibly exits when the budget is exhausted.

Question 5

What kinds of questions can the research librarian actually answer?

Accepted Answer

The lab ships with 10 seeded ML papers — LoRA, QLoRA, FlashAttention, vLLM PagedAttention, RAG, DPO, ReAct, PEFT, Attention Is All You Need, Nemotron-4. The librarian can search by topic, read abstracts, compare key ideas, report years and authors, and handle multi-hop questions that chain multiple tool calls. Example queries: 'Compare the key ideas of FlashAttention and PagedAttention.' 'What year was QLoRA published, and how does it differ from LoRA?' 'Find papers about parameter-efficient fine-tuning, and summarize the newest one.' The corpus is small by design — it's big enough to demonstrate every tool-calling pattern, small enough that you can read every entry and verify the agent's answers by hand.

Question 6

What do I need to know before starting this lab?

Accepted Answer

Comfortable with Python (functions, classes, decorators, dicts), basic familiarity with REST APIs and JSON schemas, and a rough sense of what an LLM is and how you call one. You don't need prior LangChain, LangGraph, or NIM experience — we build those up from scratch. You don't need a GPU or any local setup either — the lab pod runs in a cloud environment we provision per session, with `langchain_nvidia_ai_endpoints`, `langgraph`, and the NeMo Agent Toolkit (`nat` CLI) all preinstalled against a live NIM proxy. Checks run after each step and verify real agent behavior, not just string matching.

Build a ReAct Agent with NVIDIA NIM

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll build in this ReAct agent lab

Frequently asked questions

What's the difference between `create_agent` and building the loop yourself in LangGraph?

Why do I need `bind_tools` if `create_agent` exists?

Why does YAML (NeMo Agent Toolkit) matter if I already have Python code that works?

Why would I ever need a tool-call budget in a production agent?

What kinds of questions can the research librarian actually answer?

What do I need to know before starting this lab?

Build a ReAct Agent with NVIDIA NIM

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll build in this ReAct agent lab

Frequently asked questions

What's the difference between create_agent and building the loop yourself in LangGraph?

Why do I need bind_tools if create_agent exists?

Why does YAML (NeMo Agent Toolkit) matter if I already have Python code that works?

Why would I ever need a tool-call budget in a production agent?

What kinds of questions can the research librarian actually answer?

What do I need to know before starting this lab?

What's the difference between `create_agent` and building the loop yourself in LangGraph?

Why do I need `bind_tools` if `create_agent` exists?