Multi-Agent Orchestration with LangGraph
Hosted
Beta

Multi-Agent Orchestration with LangGraph

Build a supervisor agent that routes queries to specialist agents — a core architecture pattern tested on the NCP-AAI exam.

40 min·6 steps·3 domains·Intermediate·ncp-aai

What you'll learn

  1. 1
    Create Specialist Tools
    A single agent with many tools becomes unreliable as complexity grows. It struggles to choose the right tool from a large set, its prompts become bloated, and errors in one area affect everything.
  2. 2
    Create Specialist Agents
    Each specialist is a complete ReAct agent with its own tools. When the router sends a query to a specialist, it handles the full Thought → Act → Observe loop independently.
  3. 3
    Build a Router
    The router is the brain of the multi-agent system. It analyzes each incoming query and decides which specialist should handle it.
  4. 4
    Build the Multi-Agent Graph
    Now we combine the router and specialists into a single LangGraph StateGraph. This is the complete supervisor architecture.
  5. 5
    Add a Third Agent
    A key advantage of the supervisor pattern is modularity — adding a new specialist doesn't require changing existing agents, only the router.
  6. 6
    Test the Multi-Agent System
    Testing a multi-agent system is harder than testing a single agent. You need to verify:

Prerequisites

  • Completed Lab 1 (ReAct Agent) or equivalent
  • Understanding of LangGraph StateGraph basics
  • Familiarity with tool calling and @tool decorator

Exam domains covered

Agent Architecture and DesignCognition, Planning, and MemoryAgent Development

What you'll build in this multi-agent orchestration lab

LangGraph multi-agent orchestration is the pattern that unlocks complex agentic workflows past the point where a single ReAct agent breaks down — around 8-12 tools the LLM picks wrong, prompts balloon past budget, and errors in one tool contaminate unrelated queries. This lab builds a supervisor-plus-specialists architecture from the tools up against NVIDIA NIM endpoints we provision, then proves it scales by adding a third specialist without touching the first two. You finish with a working LangGraph StateGraph doing conditional routing, three independent ReAct specialists, an LLM-based classifier, and the intuition to decide when this pattern beats a single agent with many tools.

The substance is the primitives you'll reuse across every LangGraph agent: a TypedDict shared state with an add_messages reducer on the message list, nodes returning state updates that LangGraph merges per field, conditional edges keyed on a state field (state['next_agent']), and the difference between 'supervisor' (fan-out to specialists that return to END) and 'agent-as-node' graphs. Each specialist is a create_agent instance with its own focused system prompt and tool set — researcher with a knowledge-search tool, calculator with safe arithmetic, writer with a report template. The router is a classifier LLM call — swap it for a keyword filter or fine-tuned model when latency matters. You also internalize the main failure mode (pathological routing loops) and how to cap it with a hop counter before it eats your inference budget.

Prerequisites: Python, the react-agent-nim lab (so create_agent, @tool, bind_tools, and ToolMessage are familiar), and basic LangGraph StateGraph awareness. The hosted environment ships with LangGraph, langchain.agents, and the LangChain NIM integration preinstalled, running against our managed NIM proxy serving meta/llama-3.3-70b-instruct — no keys, no GPU pod. About 40 minutes of focused work. You leave with a working three-specialist router, a behavioural test matrix that verifies routing accuracy plus end-to-end answer quality, and a scaling story you can point at: adding the writer specialist is strictly additive — one new tool, one new agent, one new router label — zero changes to existing nodes.

Frequently asked questions

When does a single agent with many tools outperform a supervisor?

Below roughly 6–8 tools, with a well-written system prompt and clear tool descriptions, a single agent is usually simpler, cheaper, and just as accurate — the supervisor pattern adds an extra LLM call per query for the routing decision. The multi-agent win kicks in around 10+ tools, or whenever tools cluster into clearly-different domains (research, math, writing, code, retrieval). If your tools are all variants of 'query one of five databases', a single agent is fine; if they span modalities, specialists start to pay off.

How does the router node actually classify?

In the lab it's a small LLM call with a classification-style prompt — 'Given the user's query, return exactly one of {researcher, calculator, writer}' — parsed and written to state['next_agent']. That's cheap and flexible but stochastic; for latency-critical paths you can replace it with a keyword classifier, a small fine-tuned model, or even a regex-based pre-filter that only falls through to the LLM for ambiguous queries. The conditional edge after the router reads state['next_agent'] and dispatches to the right specialist node, so swapping the classifier is a one-node change.

What does the shared state object look like in LangGraph?

A TypedDict with the fields every node needs to read or update — typically at minimum messages: list[BaseMessage] (appended by every node) and routing fields like next_agent: str. Each node returns a dict of updates and LangGraph merges them according to each field's annotated reducer (add_messages for the messages list, overwrite for scalar fields). The lab keeps the state small and typed on purpose so the graph's control flow stays legible; production systems often carry user id, thread id, and per-specialist scratchpads in the same state object.

Why add a writer as a third specialist instead of letting the researcher also write?

Because 'write a three-paragraph summary' and 'find five facts from the knowledge base' have different success criteria and need different prompts. The researcher is tuned for breadth and recall; the writer is tuned for structure, concision, and style. Mixing them creates a prompt that's mediocre at both. The lab specifically has you add the writer as a third node to make the scaling property concrete — adding a specialist is strictly additive: one new tool, one new agent, one new router label, zero changes to existing nodes.

How do I handle a query that should touch multiple specialists?

Two options. The simple one is have the router emit a sequence of specialists instead of one, and loop: researcher returns facts, writer reads the messages state and drafts the report. The more robust one is a reducer-style graph where each specialist writes into a shared scratchpad and a synthesizer node at the end merges everything. The lab sticks with single-specialist routing to keep the pattern crisp; once you're comfortable, extending to multi-step routing is a small modification to the conditional edge logic and the state shape.

What's the failure mode to watch for in production?

Pathological looping — specialist A returns an answer the router doesn't recognise as final, routes back into specialist B, specialist B produces output that re-routes to A, and so on. LangGraph's recursion limit saves you in principle but the correct fix is to make the END transition explicit in each specialist (the ReAct loop terminates when the LLM returns text without tool calls) and to add a 'max specialist hops' counter to state. The reflection step pushes you toward instrumenting exactly this so you can detect and cap it before it eats your NIM quota.