Build a ReAct Agent with NVIDIA NIM
Build a complete reasoning + acting agent from scratch using LangChain, LangGraph, and NeMo Agent Toolkit — the three pillars of the NCP-AAI exam.
What you'll learn
- 1Connect to NVIDIA NIMNVIDIA NIM (NVIDIA Inference Microservices) packages AI models as optimized, containerized microservices. Instead of loading a model onto a GPU yourself, you call a NIM endpoint — just like calling a REST API.
- 2Define ToolsAn LLM by itself can only generate text. Tools give it the ability to *do things* — search databases, call APIs, run calculations, or interact with external systems.
- 3Tool Calling with bind_toolsWhen you give an LLM access to tools, the conversation flow fundamentally changes. Instead of always responding with text, the model can respond with a tool call request — a structured message saying "I want to call function X with arguments Y."
- 4Build a ReAct AgentReAct (Reasoning + Acting) is the most common agent architecture. It works in a loop:
- 5Build an Explicit LangGraph AgentThe create_agent() function from step 4 is convenient, but the NCP-AAI exam tests your understanding of what's inside it. Let's build the same agent from scratch using LangGraph's StateGraph.
- 6NeMo Agent Toolkit: YAML WorkflowEverything we've built so far uses Python code. NeMo Agent Toolkit (NAT) takes a different approach — you define agents in YAML configuration files and run them with the nat CLI.
- 7Test and Evaluate Your AgentBuilding an agent is step one. Knowing if it works reliably is step two — and it's 13% of the NCP-AAI exam.
- 8Evaluate with nat evalStep 7 used a hand-written test loop. NeMo Agent Toolkit has a built-in evaluation system that does this at scale with standardized metrics.
Prerequisites
- Basic Python (functions, classes, dicts)
- Familiarity with REST APIs
- Understanding of what LLMs are
Exam domains covered
Skills & technologies you'll practice
This intermediate-level ai/ml lab gives you real-world reps across:
What you'll build in this ReAct agent lab
ReAct is the foundational agent architecture every production LLM stack is built on — tool calling, multi-agent orchestration, agentic RAG all compose the same Thought-Action-Observation loop. This lab builds a ReAct agent from scratch against NVIDIA NIM endpoints we provision, then rebuilds the same agent two more times to expose the abstraction layers: once with LangChain's create_agent, once as an explicit LangGraph StateGraph with an llm_node, tool_node, and should_continue edge, once as a NeMo Agent Toolkit YAML workflow you run with the nat CLI. You leave with working code in all three shapes, a mental model of which abstraction fits which problem, and a test harness that mirrors what real teams run before shipping.
The technical core is the think-act-observe loop: a HumanMessage goes in, the LLM emits either a final AIMessage or a tool_call, your code executes the tool and returns a ToolMessage, and the loop continues until the model produces a terminal answer. You see why llm.bind_tools([...]) is a schema contract enforced server-side rather than prompt engineering, how LangChain turns a Python function's docstring and type hints into an LLM-readable tool schema, and why StateGraph's conditional edges make the termination logic inspectable instead of hidden behind a black-box loop. The NAT YAML section demystifies the declarative layer platform teams prefer — same functions, same workflow, same LLM, just in a config file version control can review.
Python comfort with functions and type hints is the only prerequisite; no prior LangChain or LangGraph knowledge is assumed. The hosted environment has langchain-nvidia-ai-endpoints, LangGraph, and NeMo Agent Toolkit (nat) preinstalled and pointed at our managed NIM proxy serving the Nemotron reasoning family — no API keys, no GPU provisioning, no local runtime setup. About 35 minutes of focused work, finishing with a custom test harness and the same tests run through nat eval to produce an accuracy score on your dataset — the exact shape NeMo Evaluator uses as a managed service.
Frequently asked questions
Why build the same agent three ways — create_agent, LangGraph, and NAT YAML?
create_agent is the fastest path for experimentation — one line and you have a ReAct loop. LangGraph's StateGraph is what you reach for when you need non-trivial routing, checkpointing, or a supervisor pattern. NeMo Agent Toolkit YAML is what DevOps and platform teams prefer: agent configs live in version control, reviewers don't have to read Python, and the nat CLI handles running and evaluating workflows. Seeing all three side by side makes it obvious which abstraction fits which problem.What does llm.bind_tools actually do under the hood?
.invoke(), passes a tools=[...] parameter to the underlying OpenAI-compatible chat endpoint. The tool schema is generated from each @tool function's name, docstring, and type hints. The server enforces the schema: when the model decides to call a tool, the reply contains a structured tool_calls field, not free-form JSON embedded in the text. That's why tool calling is more reliable than prompting the model to emit JSON yourself.How does the ReAct loop decide when to stop?
AIMessage with a non-empty content field and an empty tool_calls list — the model has decided it has enough information to answer. In the explicit LangGraph version, that logic lives in the should_continue edge: if the last message has tool_calls, route to the tool node; otherwise route to END. If you set recursion_limit too low on an agent that keeps wanting more tools, you'll see the loop terminate early with an error, which is the framework's safety net against infinite back-and-forth.What's in the NAT workflow YAML and how does it map to LangChain code?
llms (model endpoints), functions (tool definitions, referenced by name), and workflows (the agent type plus which functions and llm it uses). The _type: react_agent block is the YAML equivalent of create_agent(model=llm, tools=[...]). When nat run executes it, NAT resolves the names, instantiates the same LangChain components under the hood, and runs the same ReAct loop — the abstraction is declarative, but the execution is identical.What is nat eval doing that my Step 7 test harness isn't?
{question, expected} pairs, call the agent, check whether the expected substring appears in the answer. nat eval standardizes that into a dataset format plus an evaluator configuration, can run multiple evaluators (accuracy, RAGAS metrics, a judge LLM) on the same pass, logs per-row outputs for inspection, and integrates with NeMo Evaluator for managed runs. Same concept, production scaffolding.What models does this lab use via the NIM proxy?
ChatNVIDIA, reached at http://nim-proxy.labs.svc:8080/v1. You don't need an API key — the proxy injects one — and you don't provision any GPU. Because every call goes through the same OpenAI-compatible surface, swapping to a different NIM-served model is a one-line change and the same ReAct loop keeps working, which is one of the things the lab is designed to demonstrate.