Structured Output & Function Calling with NIM
Get reliable machine-parseable data out of an LLM. Compare prompt-only JSON extraction against the function-calling API, chain two tools, and measure the reliability gap on a real extraction task.
What you'll learn
- 1Prompt-only JSON extraction (baseline)The easiest way to get structured data out of an LLM is to ask for JSON in the prompt. It mostly works — but *mostly* isn't good enough for production, and you'll quantify why in step 4.
- 2Function calling with a JSON SchemaInstead of asking the model to write JSON in its reply, you tell it: *"here is a function, here is its schema, call it with the right arguments."* The model then returns a tool_call with arguments already validated against your schema — no regex, no json.loads.
- 3Chain two tools with validationReal agents use more than one tool per turn. A classifier decides *what kind of message this is*, then a specialized extractor does the extraction that fits. This lets you handle multiple schemas without cramming them into one.
- 4Measure prompt-only vs tools reliabilityPrompt-only JSON extraction works until the message is messy. Tools-mode keeps working because the schema is enforced at the API boundary.
Prerequisites
- Completed `react-agent-nim` or comparable NIM exposure
- Basic Python (functions, dataclasses or dicts)
- Familiarity with JSON Schema
Exam domains covered
Skills & technologies you'll practice
This intermediate-level ai/ml lab gives you real-world reps across:
What you'll build in this function-calling lab
Function calling is what separates LLM demoware from production — the moment you need machine-readable output that validates on the first try, prompting the model for JSON stops being good enough. This lab builds the same contact-and-invoice extraction pipeline two ways and measures the reliability gap: prompt-only JSON extraction as the naive baseline, versus schema-enforced function calling via the tools parameter on NVIDIA NIM endpoints we provision. You walk away with a structured-output pattern you can drop into any LangChain agent, intuition for when prompt mode is fine and when it actively fails, and a harness that gives you a concrete valid-vs-broken count on your own data.
The technical substance is the distinction between asking nicely for JSON and enforcing a schema at the API boundary. You define a save_contact tool with a formal JSON Schema, pass it via the tools=[...] parameter, set tool_choice to force the call, and read message.tool_calls[0].function.arguments — already valid against your schema, no regex, no markdown-fence stripping. You then compose two tools — a classifier that returns "contact" | "invoice" and a specialised save_invoice with its own schema — and see why a router-plus-specialist pipeline is cleaner than one fat union schema. The final step runs both strategies over intentionally awkward inputs (trailing commas, apostrophes in names, emoji, multi-line bodies) and surfaces the reliability gap quantitatively.
Prerequisites: Python with dicts or dataclasses, prior exposure to a NIM-backed agent (the react-agent-nim lab is the natural entry point), and a rough mental model of JSON Schema. The hosted environment ships with the OpenAI Python SDK pointed at our managed NIM proxy serving meta/llama-3.3-70b-instruct — same OpenAI-compatible tool_calls surface you'd use against the real API, no keys, no GPU provisioning. About 30 minutes of focused work, ending with a message-by-message report of which extractions parsed cleanly against the schema and which didn't — the same accounting real teams run before picking the default extraction mode for their pipelines.
Frequently asked questions
What's the difference between a tool call and a function call in the OpenAI schema?
message.tool_calls, a list of objects each containing a type: "function" wrapper and a function object with name and arguments (a JSON string). "Function calling" is the older name — the API was originally one function per call — and "tool calling" is the current name that generalizes to multiple callable tools per turn. NIM's OpenAI-compatible endpoints expose both fields, and this lab uses the modern tool_calls shape throughout.Why is function calling more reliable than prompting for JSON?
tools=[...] with a JSON Schema, the endpoint constrains generation to produce arguments that match the schema — required fields are present, types are correct, the output parses. Prompt-only extraction depends on the model having seen enough JSON in training to emit valid JSON for your specific shape, and it breaks on edge cases: trailing commas, unescaped quotes in names, markdown code fences, etc. Step 4 measures the gap concretely on a noisy test set.What does tool_choice control and when should I set it?
tool_choice tells the endpoint how aggressively to call tools. "auto" (the default) lets the model decide whether to emit a tool call or a text reply. "required" forces a tool call. {"type": "function", "function": {"name": "save_contact"}} forces a specific tool. Use "auto" for agents that may or may not need the tool. Use "required" when your extraction pipeline must produce structured output, as in Step 2 — you don't want the model to chat back "sure, here's the info" in prose.Why classify first, then pick a schema, instead of using one giant schema with all fields?
router → specialist tool pattern real agents use. Step 3 walks you through both tools and the routing shim.What counts as a "valid" extraction in Step 4's comparison?
json.loads fails, if a required field is missing or null, or if a type doesn't match (e.g., phone came back as a number instead of a string). The Step 4 harness counts valid vs broken for both the prompt-only path and the tools path over the same messy input set. The expected outcome is that tools mode maintains a near-perfect valid rate while prompt-only degrades visibly on the awkward inputs.Does every NIM model support function calling?
tools and return tool_calls. But not every model in the NIM catalog exposes function calling: some older vision-language models (for example meta-llama/llama-3.2-11b-vision-instruct) will return 404 No endpoints found that support tool use when you pass tools. The vlm-visual-qa lab explores that split explicitly; here, the contact-extraction models are all on the supported list.