Question 1

Isn't ReAct strictly better than direct tool calling?

Accepted Answer

No — ReAct trades latency and tokens for reasoning transparency and adaptive planning, and that trade is only worth it when the extra reasoning pays off. A query like 'What plan is alice@startup.io on?' needs one tool call and no reasoning; ReAct will still emit a Thought explaining that it needs to look up the account, then an Action, then an Observation, then a Thought summarising the answer. That's four LLM round-trips to answer a question tool-calling solves in two. Save ReAct for queries that actually branch, like 'if she's on Pro, tell her about the Enterprise upgrade path; if she's on Enterprise, check her seat usage.'

Question 2

What's the difference between ReWOO and plan-and-execute?

Accepted Answer

ReWOO (Reasoning WithOut Observation) is the specific plan-and-execute variant the lab uses. Its signature is that the planner commits to every tool call upfront without seeing any observations, the executor runs them in batch, and the synthesizer produces the final answer in one more LLM call. Classical plan-and-execute lets the planner re-plan between execution phases based on observations, which is more robust but slower. ReWOO wins on independent tool calls that can run in parallel — the three-calls-in-2x-wall-clock speedup is real — and loses when a later step's input depends on an earlier step's output.

Question 3

Why benchmark all three on the same tools?

Accepted Answer

Because pattern comparisons only matter if the confounds are controlled. If the ReAct agent has a richer knowledge base than the tool-calling agent, you've measured the knowledge base, not the pattern. The lab's `check_account`, `search_kb`, and `create_ticket` are a single module imported by every agent — identical behaviour, identical latency, identical Milvus index. The differences you see in the benchmark trace back to the agent's decision-making loop, which is the whole point of the comparison.

Question 4

How do I score 'answer quality' fairly across patterns?

Accepted Answer

The lab uses LLM-as-judge: a separate grader LLM receives the query, the gold answer, and the candidate answer, and returns a 0–5 score plus reasoning. That's imperfect but reproducible, and it's standard practice for agent evaluation. For production you'd combine LLM-as-judge with deterministic checks (did the agent call `create_ticket` for ticket-creation queries? did it cite the right KB article?) and some human review on a sample. The lab shows the pattern so you can swap in your own rubric.

Question 5

Which pattern should I pick for a production customer-support agent?

Accepted Answer

Usually direct tool calling, with ReAct as a fallback for complex branching queries. Real support traffic is dominated by single-hop intents — 'what's my plan', 'reset my API key', 'when does my trial end' — and tool calling answers those in one LLM round-trip. Route the ~5% of multi-hop, branching, conditional queries to ReAct; route the rare 'audit everything about this account' queries (many independent lookups) to ReWOO. The decision-framework step formalises exactly this routing logic.

Question 6

Can I mix patterns inside one agent?

Accepted Answer

Yes, and it is often the right answer. Pattern 2 (direct tool calling) as the outer loop, with a fallback into Pattern 1 (ReAct) when the query classifier flags the request as 'needs multi-step reasoning', and a subroutine of Pattern 3 (ReWOO) when you detect a burst of independent sub-queries. That's essentially what 'agentic' systems in production look like under the hood. The lab's final decision framework is deliberately structured so you could implement exactly that routing on top of the three implementations you just built.

Build an AI Agent 3 Ways: ReAct vs Tool Calling vs Plan-and-Execute

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll build in this agent patterns lab

Frequently asked questions