Build a ReAct Agent with NVIDIA NIM
Build a working AI research librarian — an agent that can search a corpus of ML papers, read abstracts, compare methods, and reason over them to answer multi-step questions. Uses LangChain, LangGraph, and NVIDIA NeMo Agent Toolkit on real NIM endpoints.
What you'll learn
- 1Connect to NVIDIA NIMIn the next 8 steps, you'll build your own AI research librarian — an agent that can search a corpus of machine-learning papers, read abstracts, compare methods, and reason over them to answer questions like:
- 2Define Research ToolsAn LLM by itself can only generate text. Tools give it the ability to *do things* — search databases, call APIs, run calculations, look up real data. For our research librarian, tools are what let the model stop guessing and start looking things up.
- 3Tool Calling with bind_toolsWhen you give an LLM access to tools, the conversation flow changes. Instead of always responding with text, the model can respond with a tool call request — a structured message saying *"I want to call function X with arguments Y."*
- 4Build a ReAct AgentReAct (Reasoning + Acting) is the foundational agent architecture. It works in a loop:
- 5Build an Explicit LangGraph Agentcreate_agent gave you a prebuilt ReAct loop. Convenient, but it's a black box — you can't add custom guards, side effects, or routing logic. In this step you build the same loop from scratch using LangGraph's StateGraph, then bolt on something create_agent can't give you: a tool-call budget.
- 6NeMo Agent Toolkit: YAML WorkflowEverything you've built so far is Python code. NeMo Agent Toolkit (NAT) takes a different approach — you define agents in YAML configuration files and run them with the nat CLI.
- 7Test and Evaluate Your AgentBuilding an agent is step one. Knowing when it fails is step two — and it's the line between a demo and a production system.
- 8Evaluate with nat eval + meet your librarianStep 7 used a hand-written test loop with substring matching. NAT has a built-in evaluation system that runs the same kind of loop at scale, with standardized metrics, structured datasets, and output reports.
Prerequisites
- Basic Python (functions, classes, dicts)
- Familiarity with REST APIs
- Understanding of what LLMs are
Exam domains covered
Skills & technologies you'll practice
This intermediate-level ai/ml lab gives you real-world reps across:
What you'll build in this ReAct agent lab
ReAct is the foundational agent architecture every production LLM stack is built on — the think-act-observe loop that lets an LLM reason over tools it doesn't have baked into its weights. In roughly 35 minutes on real NVIDIA NIM endpoints, you'll build an AI research librarian that can search a corpus of ML papers, read abstracts, compare methods, and answer multi-step questions like 'Which paper introduced LoRA, and how does QLoRA improve on it?' You walk away with a working agent in three framings — create_agent (prebuilt), an explicit LangGraph StateGraph with a custom tool-call budget, and a YAML-driven NeMo Agent Toolkit workflow — plus an interactive chat loop where you can talk to your librarian directly and a full evaluation pipeline that tells you when it's failing.
The technical substance is the three-way decomposition of the agent loop. llm.bind_tools(tools) attaches a JSON Schema for each tool to the model's system prompt, so the model can emit a structured tool_calls list instead of free-form text. You dispatch each call, wrap the result in a ToolMessage with the matching tool_call_id, feed the conversation back in, and repeat until the model stops asking for tools. create_agent does this loop automatically. LangGraph's StateGraph lets you build the same loop from scratch with custom routing — and you'll use that power to bolt on a tool-call budget (hard-cap at 4 calls per run) that protects your NIM proxy from runaway loops. The NeMo Agent Toolkit takes the opposite tack: the same agent expressed as 20 lines of YAML, driven by the nat run CLI, with model swaps handled by editing model_name and reloading — the production win is config-driven A/B testing without code changes.
Prerequisites: comfort with Python classes and decorators, basic understanding of LLM APIs (prompt → response), and knowing what a JSON schema looks like. The sandbox is a real NIM-backed environment with langchain_nvidia_ai_endpoints, langgraph, langchain.agents.create_agent, and the NeMo Agent Toolkit (nat) CLI all preinstalled. Lab pods never see the NVIDIA API key — the in-cluster proxy injects it at the network edge. Every step is graded against real tool behavior, not static string matching: step 3 requires an actual tool_calls message, step 5 verifies the budget counter incremented correctly, step 7 requires the agent to answer at least 2 of 5 real research questions correctly, and step 8 requires a nat eval run against a populated JSON dataset to complete successfully. At the end you get a chat.py that lets you talk to your librarian for as long as the session lasts.
Frequently asked questions
What's the difference between create_agent and building the loop yourself in LangGraph?
create_agent and building the loop yourself in LangGraph?create_agent gives you a prebuilt ReAct graph — nodes, edges, routing all wired up — for the common case. LangGraph's StateGraph is the underlying primitive: you define the state schema, nodes, conditional edges, and you can add things create_agent doesn't expose, like tool-call budgets, custom routing to specialist sub-agents, side-effect nodes (logging, metrics), or early-exit conditions based on state. Step 5 of this lab uses StateGraph to cap tool calls at 4 per run — something you can't do with create_agent today. Production agents almost always end up on StateGraph eventually; create_agent is for prototypes.Why do I need bind_tools if create_agent exists?
bind_tools if create_agent exists?bind_tools directly when you use create_agent — it does that internally. Step 3 of this lab walks through bind_tools manually so you understand what create_agent is doing under the hood. Specifically: bind_tools(tools) attaches each tool's JSON schema to the LLM client, so when you invoke it, the model can respond with a structured tool_calls list (name + args + id) instead of free-form text. Your code dispatches each call, wraps the result in a ToolMessage with the matching tool_call_id, and feeds the updated conversation back in. That's the entire ReAct loop in manual form.Why does YAML (NeMo Agent Toolkit) matter if I already have Python code that works?
nat run; no code deploy. Hot-swap a tool implementation for a stubbed version during offline evals. Version the exact config that produced last month's eval scores and reproduce them byte-for-byte. Python gives you flexibility; YAML gives you reproducibility and ops-friendliness. Mature teams use both — Python for novel logic, YAML for deployment-time configuration.Why would I ever need a tool-call budget in a production agent?
What kinds of questions can the research librarian actually answer?
What do I need to know before starting this lab?
langchain_nvidia_ai_endpoints, langgraph, and the NeMo Agent Toolkit (nat CLI) all preinstalled against a live NIM proxy. Checks run after each step and verify real agent behavior, not just string matching.