Build & submit taskBetaintermediate

Build a Tool-Using ReAct Agent

An LLM that can call tools stops being a debate partner and starts being useful. Build a ReAct agent with LangChain or LangGraph that reasons, calls 2-3 real tools, stays inside a tool-call budget so it cannot loop forever, and solves multi-step questions. Submit a single script or notebook for instant, rubric-based feedback.

4 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

The ReAct loop

Interleave reasoning and tool calls until the agent can answer.

Tool design

Define tools with clear names, descriptions, and typed inputs the model can call correctly.

Budgets and safety

Cap iterations so a confused agent cannot loop forever or run up cost.

Observability

Surface intermediate steps so you can see why the agent did what it did.

See how it works

The ReAct loop

the ReAct loop

3/8

THOUGHTI need the user's order total. I should look up their latest order.

ACTIONget_orders(user="u_42", limit=1)

OBSERVATION[{"id": "o_991", "status": "shipped", "total": null}]

Thought, action, observation, repeat. ReAct interleaves reasoning and acting. The model thinks out loud about what to do, takes one action, and reads the observation that comes back, and crucially that observation feeds the next thought. Here the first lookup returns a null total, so the agent reasons that it needs the line items, fetches them, sums them, and only then answers. A model that only reasoned would have guessed; a model that only acted could not adapt. Alternating the two is what makes it an agent.

Reason, act, observe, repeat. The agent thinks about what it needs, calls a tool, reads the result, and loops until it can answer.

Why agents need a budget

the loop guard

continue

laps used4 / 10

steps

cost used$0.8 / $2.0

$ cost

run another lap

Finished beats a limit; a limit beats forever. Check the conditions before each lap. A finished answer ends the run even on the last allowed lap, you want the answer, not a limit error, so it is checked first. Otherwise the cost ceiling and the step cap each halt a stuck agent, and they use a "reached the cap" boundary so a budget of N stops at N. Without this guard an agent retrying a failing tool calls the model every lap forever, which is the single most common way an agent run turns into a runaway bill.

Without a cap on tool calls, a confused agent loops forever and burns money. A step budget is the safety belt.

The scenario

Your product needs an assistant that can actually do things: look a value up, run a calculation, search a small knowledge source, and combine the results to answer a question no single tool could. A plain chat completion cannot; it can only talk. You need an agent that reasons about which tool to call, calls it, reads the result, and decides what to do next.

You have been asked to build a ReAct agent: a reason-and-act loop over a small set of real tools, with a hard budget on tool calls so a confused agent cannot spin forever and burn money.

Your role

You are an AI Engineer building a tool-using agent. Your goal is a single module with a ReAct loop, two or three real tools, a tool-call budget, and a demonstration of the agent solving multi-step questions while showing its reasoning and tool calls.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 5-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

LangGraph documentation

Build agents as graphs with explicit state and limits.

langchain-ai.github.io

LangChain tool calling

Define tools and let the model call them.

python.langchain.com

ReAct: Synergizing Reasoning and Acting

The paper behind the reason-and-act loop.

arxiv.org

What you'll build in this agent task

This is a build-and-submit task. You build a tool-using ReAct agent with LangChain or LangGraph: a reason-and-act loop over two or three real tools, with a tool-call budget so it cannot spin forever, solving questions no single call could answer. The deliverable is one Python file that shows the agent's reasoning and tool calls.

An LLM on its own can only talk. The moment you give it tools and a loop, it can look things up, compute, and combine results into an answer. The engineering is in the loop and its limits: clear tool descriptions so the model calls the right one, visible intermediate steps so you can debug it, graceful handling of tool errors, and a hard budget so a confused agent does not run away.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (framework, real tools, the ReAct loop, the tool-call budget, and the multi-step demonstration) and returns per-criterion feedback with evidence quoted from your code. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

ReAct or tool-calling, which should I use?

Either satisfies the rubric. A classic ReAct loop (Thought, Action, Observation) and a modern tool-calling agent both interleave reasoning with tool calls. LangGraph's prebuilt ReAct agent or LangChain's AgentExecutor are both fine; what matters is a real loop with a budget, not a single fixed call.

What tools should I build?

Small, real ones: a calculator, a lookup over a little built-in dataset, a date or string utility, a search over a few documents. Pick at least two whose combination is needed to answer your demo questions, so the agent has to actually chain tool calls.

How is the budget graded?

The grader checks that you cap iterations (max_iterations in LangChain, recursion_limit in LangGraph) and that your demonstration shows what happens when the cap is reached. The point is proving the agent cannot loop forever.

What counts as a complete submission?

A single .py or .ipynb that defines two or more tools, runs a ReAct or tool-calling loop with an enforced budget, surfaces intermediate steps, and demonstrates at least two multi-step questions.