Build a ReAct Agent with NVIDIA NIM
Hosted
BetaFree

Build a ReAct Agent with NVIDIA NIM

Build a working AI research librarian — an agent that can search a corpus of ML papers, read abstracts, compare methods, and reason over them to answer multi-step questions. Uses LangChain, LangGraph, and NVIDIA NeMo Agent Toolkit on real NIM endpoints.

35 min·8 steps·4 domains·Intermediate·ncp-aai

What you'll learn

  1. 1
    Connect to NVIDIA NIM
    In the next 8 steps, you'll build your own AI research librarian — an agent that can search a corpus of machine-learning papers, read abstracts, compare methods, and reason over them to answer questions like:
  2. 2
    Define Research Tools
    An LLM by itself can only generate text. Tools give it the ability to *do things* — search databases, call APIs, run calculations, look up real data. For our research librarian, tools are what let the model stop guessing and start looking things up.
  3. 3
    Tool Calling with bind_tools
    When you give an LLM access to tools, the conversation flow changes. Instead of always responding with text, the model can respond with a tool call request — a structured message saying *"I want to call function X with arguments Y."*
  4. 4
    Build a ReAct Agent
    ReAct (Reasoning + Acting) is the foundational agent architecture. It works in a loop:
  5. 5
    Build an Explicit LangGraph Agent
    create_agent gave you a prebuilt ReAct loop. Convenient, but it's a black box — you can't add custom guards, side effects, or routing logic. In this step you build the same loop from scratch using LangGraph's StateGraph, then bolt on something create_agent can't give you: a tool-call budget.
  6. 6
    NeMo Agent Toolkit: YAML Workflow
    Everything you've built so far is Python code. NeMo Agent Toolkit (NAT) takes a different approach — you define agents in YAML configuration files and run them with the nat CLI.
  7. 7
    Test and Evaluate Your Agent
    Building an agent is step one. Knowing when it fails is step two — and it's the line between a demo and a production system.
  8. 8
    Evaluate with nat eval + meet your librarian
    Step 7 used a hand-written test loop with substring matching. NAT has a built-in evaluation system that runs the same kind of loop at scale, with standardized metrics, structured datasets, and output reports.

Prerequisites

  • Basic Python (functions, classes, dicts)
  • Familiarity with REST APIs
  • Understanding of what LLMs are

Exam domains covered

Agent Architecture and DesignAgent DevelopmentCognition, Planning, and MemoryNVIDIA Platform Implementation

Skills & technologies you'll practice

This intermediate-level ai/ml lab gives you real-world reps across:

ReActLangChainLangGraphNeMo Agent ToolkitTool CallingNIM

What you'll build in this ReAct agent lab

ReAct is the foundational agent architecture every production LLM stack is built on — the think-act-observe loop that lets an LLM reason over tools it doesn't have baked into its weights. In roughly 35 minutes on real NVIDIA NIM endpoints, you'll build an AI research librarian that can search a corpus of ML papers, read abstracts, compare methods, and answer multi-step questions like 'Which paper introduced LoRA, and how does QLoRA improve on it?' You walk away with a working agent in three framings — create_agent (prebuilt), an explicit LangGraph StateGraph with a custom tool-call budget, and a YAML-driven NeMo Agent Toolkit workflow — plus an interactive chat loop where you can talk to your librarian directly and a full evaluation pipeline that tells you when it's failing.

The technical substance is the three-way decomposition of the agent loop. llm.bind_tools(tools) attaches a JSON Schema for each tool to the model's system prompt, so the model can emit a structured tool_calls list instead of free-form text. You dispatch each call, wrap the result in a ToolMessage with the matching tool_call_id, feed the conversation back in, and repeat until the model stops asking for tools. create_agent does this loop automatically. LangGraph's StateGraph lets you build the same loop from scratch with custom routing — and you'll use that power to bolt on a tool-call budget (hard-cap at 4 calls per run) that protects your NIM proxy from runaway loops. The NeMo Agent Toolkit takes the opposite tack: the same agent expressed as 20 lines of YAML, driven by the nat run CLI, with model swaps handled by editing model_name and reloading — the production win is config-driven A/B testing without code changes.

Prerequisites: comfort with Python classes and decorators, basic understanding of LLM APIs (prompt → response), and knowing what a JSON schema looks like. The sandbox is a real NIM-backed environment with langchain_nvidia_ai_endpoints, langgraph, langchain.agents.create_agent, and the NeMo Agent Toolkit (nat) CLI all preinstalled. Lab pods never see the NVIDIA API key — the in-cluster proxy injects it at the network edge. Every step is graded against real tool behavior, not static string matching: step 3 requires an actual tool_calls message, step 5 verifies the budget counter incremented correctly, step 7 requires the agent to answer at least 2 of 5 real research questions correctly, and step 8 requires a nat eval run against a populated JSON dataset to complete successfully. At the end you get a chat.py that lets you talk to your librarian for as long as the session lasts.

Frequently asked questions

What's the difference between create_agent and building the loop yourself in LangGraph?

create_agent gives you a prebuilt ReAct graph — nodes, edges, routing all wired up — for the common case. LangGraph's StateGraph is the underlying primitive: you define the state schema, nodes, conditional edges, and you can add things create_agent doesn't expose, like tool-call budgets, custom routing to specialist sub-agents, side-effect nodes (logging, metrics), or early-exit conditions based on state. Step 5 of this lab uses StateGraph to cap tool calls at 4 per run — something you can't do with create_agent today. Production agents almost always end up on StateGraph eventually; create_agent is for prototypes.

Why do I need bind_tools if create_agent exists?

You don't need to call bind_tools directly when you use create_agent — it does that internally. Step 3 of this lab walks through bind_tools manually so you understand what create_agent is doing under the hood. Specifically: bind_tools(tools) attaches each tool's JSON schema to the LLM client, so when you invoke it, the model can respond with a structured tool_calls list (name + args + id) instead of free-form text. Your code dispatches each call, wraps the result in a ToolMessage with the matching tool_call_id, and feeds the updated conversation back in. That's the entire ReAct loop in manual form.

Why does YAML (NeMo Agent Toolkit) matter if I already have Python code that works?

The real production value isn't that YAML is easier to write — it's that YAML is easier to change. Swap the LLM from a 120B model to an 8B model by editing one line and rerunning nat run; no code deploy. Hot-swap a tool implementation for a stubbed version during offline evals. Version the exact config that produced last month's eval scores and reproduce them byte-for-byte. Python gives you flexibility; YAML gives you reproducibility and ops-friendliness. Mature teams use both — Python for novel logic, YAML for deployment-time configuration.

Why would I ever need a tool-call budget in a production agent?

Because LLMs don't always know when to stop. An agent can genuinely get stuck in a loop — call search, reason, call search again with a slightly different query, reason, ad infinitum — especially when the correct answer isn't in the tool's corpus. Each loop iteration eats API quota, delays the user's response, and in the worst case exhausts your NIM rate limit for the whole fleet. A hard cap at 4-6 tool calls per run is the cheapest production guard. Step 5 of this lab wires it up as a LangGraph state counter + conditional edge, so the graph forcibly exits when the budget is exhausted.

What kinds of questions can the research librarian actually answer?

The lab ships with 10 seeded ML papers — LoRA, QLoRA, FlashAttention, vLLM PagedAttention, RAG, DPO, ReAct, PEFT, Attention Is All You Need, Nemotron-4. The librarian can search by topic, read abstracts, compare key ideas, report years and authors, and handle multi-hop questions that chain multiple tool calls. Example queries: 'Compare the key ideas of FlashAttention and PagedAttention.' 'What year was QLoRA published, and how does it differ from LoRA?' 'Find papers about parameter-efficient fine-tuning, and summarize the newest one.' The corpus is small by design — it's big enough to demonstrate every tool-calling pattern, small enough that you can read every entry and verify the agent's answers by hand.

What do I need to know before starting this lab?

Comfortable with Python (functions, classes, decorators, dicts), basic familiarity with REST APIs and JSON schemas, and a rough sense of what an LLM is and how you call one. You don't need prior LangChain, LangGraph, or NIM experience — we build those up from scratch. You don't need a GPU or any local setup either — the lab pod runs in a cloud environment we provision per session, with langchain_nvidia_ai_endpoints, langgraph, and the NeMo Agent Toolkit (nat CLI) all preinstalled against a live NIM proxy. Checks run after each step and verify real agent behavior, not just string matching.