Multi-Agent Orchestration with LangGraph
Build a supervisor agent that routes queries to specialist agents — a core architecture pattern tested on the NCP-AAI exam.
What you'll learn
- 1Create Specialist ToolsA single agent with many tools becomes unreliable as complexity grows. It struggles to choose the right tool from a large set, its prompts become bloated, and errors in one area affect everything.
- 2Create Specialist AgentsEach specialist is a complete ReAct agent with its own tools. When the router sends a query to a specialist, it handles the full Thought → Act → Observe loop independently.
- 3Build a RouterThe router is the brain of the multi-agent system. It analyzes each incoming query and decides which specialist should handle it.
- 4Build the Multi-Agent GraphNow we combine the router and specialists into a single LangGraph StateGraph. This is the complete supervisor architecture.
- 5Add a Third AgentA key advantage of the supervisor pattern is modularity — adding a new specialist doesn't require changing existing agents, only the router.
- 6Test the Multi-Agent SystemTesting a multi-agent system is harder than testing a single agent. You need to verify:
Prerequisites
- Completed Lab 1 (ReAct Agent) or equivalent
- Understanding of LangGraph StateGraph basics
- Familiarity with tool calling and @tool decorator
Exam domains covered
What you'll build in this multi-agent orchestration lab
LangGraph multi-agent orchestration is the pattern that unlocks complex agentic workflows past the point where a single ReAct agent breaks down — around 8-12 tools the LLM picks wrong, prompts balloon past budget, and errors in one tool contaminate unrelated queries. This lab builds a supervisor-plus-specialists architecture from the tools up against NVIDIA NIM endpoints we provision, then proves it scales by adding a third specialist without touching the first two. You finish with a working LangGraph StateGraph doing conditional routing, three independent ReAct specialists, an LLM-based classifier, and the intuition to decide when this pattern beats a single agent with many tools.
The substance is the primitives you'll reuse across every LangGraph agent: a TypedDict shared state with an add_messages reducer on the message list, nodes returning state updates that LangGraph merges per field, conditional edges keyed on a state field (state['next_agent']), and the difference between 'supervisor' (fan-out to specialists that return to END) and 'agent-as-node' graphs. Each specialist is a create_agent instance with its own focused system prompt and tool set — researcher with a knowledge-search tool, calculator with safe arithmetic, writer with a report template. The router is a classifier LLM call — swap it for a keyword filter or fine-tuned model when latency matters. You also internalize the main failure mode (pathological routing loops) and how to cap it with a hop counter before it eats your inference budget.
Prerequisites: Python, the react-agent-nim lab (so create_agent, @tool, bind_tools, and ToolMessage are familiar), and basic LangGraph StateGraph awareness. The hosted environment ships with LangGraph, langchain.agents, and the LangChain NIM integration preinstalled, running against our managed NIM proxy serving meta/llama-3.3-70b-instruct — no keys, no GPU pod. About 40 minutes of focused work. You leave with a working three-specialist router, a behavioural test matrix that verifies routing accuracy plus end-to-end answer quality, and a scaling story you can point at: adding the writer specialist is strictly additive — one new tool, one new agent, one new router label — zero changes to existing nodes.
Frequently asked questions
When does a single agent with many tools outperform a supervisor?
How does the router node actually classify?
state['next_agent']. That's cheap and flexible but stochastic; for latency-critical paths you can replace it with a keyword classifier, a small fine-tuned model, or even a regex-based pre-filter that only falls through to the LLM for ambiguous queries. The conditional edge after the router reads state['next_agent'] and dispatches to the right specialist node, so swapping the classifier is a one-node change.What does the shared state object look like in LangGraph?
TypedDict with the fields every node needs to read or update — typically at minimum messages: list[BaseMessage] (appended by every node) and routing fields like next_agent: str. Each node returns a dict of updates and LangGraph merges them according to each field's annotated reducer (add_messages for the messages list, overwrite for scalar fields). The lab keeps the state small and typed on purpose so the graph's control flow stays legible; production systems often carry user id, thread id, and per-specialist scratchpads in the same state object.Why add a writer as a third specialist instead of letting the researcher also write?
How do I handle a query that should touch multiple specialists?
What's the failure mode to watch for in production?
END transition explicit in each specialist (the ReAct loop terminates when the LLM returns text without tool calls) and to add a 'max specialist hops' counter to state. The reflection step pushes you toward instrumenting exactly this so you can detect and cap it before it eats your NIM quota.