Issue 03 · Tool use

Tool use,
step by step.

A model can reason, but it cannot act. Tools close that gap. This essay walks the whole mechanism: how you define a tool, what the model actually sees, the call-and-respond loop, the four message types that carry it, parallel calls, structured output, and what happens when the model hands you broken JSON.

What this is: a from-scratch tour of how a model goes from reasoning to acting. Tool calling is the atom every agent is built from. Section 10 shows how the manual loop you build here becomes an autonomous agent. No model-internals math required: this is about the mechanism around the model, not what is inside it.

Ground truth: every code snippet on this page is a real, current API. The framework examples use LangChain's tool and agent interfaces, drawn from its documentation, not from memory. The interactive panels run a small deterministic stand-in for a model, so the numbers and JSON are identical every time you load the page. If you spot something wrong, the colophon has my contact.

Step 1

1. Why tools?

A language model predicts text. It has no hands. Tools are the hands.

What is this? A tool is an ordinary function you make available to the model: check the weather, query a database, run a calculation, search the web. The model never runs the function itself. Instead it decides when to call it and with what arguments, emits that request as structured data, and waits. Your code runs the function and hands the result back. The model reads the result and continues.

Why do we need it? A model's only output is tokens. It can write "I should check the weather in Tokyo," but it cannot reach a weather API. It can describe a SQL query but cannot execute it. Tools turn a text predictor into something that can affect the world and read fresh facts back. This is the single mechanism underneath every agent.

The loop, in one breath

Four beats. The model thinks, asks to call a tool, your code executes it, the model reads the result. Repeat until the model has nothing left to ask.

think → call → execute → observe → (repeat) → answer

think: the model reads the conversation so far. call: it emits a tool_call (a name plus arguments). execute: your code runs the real function. observe: the result goes back to the model as a message. The model stops calling tools when it decides it can answer.

The model proposes; your code disposes. The model chooses the action, but execution always happens in your runtime, never inside the model.

Step 2

2. Anatomy of a tool

The model has never seen your code. All it gets is a name, a description, and an argument schema.

What is this? You write a normal function. The framework turns it into a tool the model can see. In LangChain, the @tool decorator does this automatically: the function name becomes the tool name, the docstring becomes the description, and the type hints become the argument schema.

Why does the docstring matter so much? Because it is the entire user manual the model gets. The model cannot read your function body. A vague description ("does weather stuff") makes the model misuse or avoid the tool. A precise one directly raises tool-calling accuracy. The description is not a comment; it is part of the prompt.

What you write, and what the model sees

Left: the Python you write. Right: the JSON-shaped schema the framework extracts and shows the model. Name, description, and typed arguments, nothing else.

You write

@tool
def get_weather(
    city: str,
    units: str = "celsius",
) -> str:
    """Current weather
    for a city."""
    # real work here
    return result

→

The model sees

{
  "name": "get_weather",
  "description":
    "Current weather for a city",
  "parameters": {
    "city": "string, required",
    "units": "string = celsius"
  }
}

The code

The @tool decorator. Name from the function, description from the docstring, schema from the type hints.

from langchain.tools import tool

@tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get the current weather for a city.

    Args:
        city: The city name to look up weather for.
        units: Temperature units, 'celsius' or 'fahrenheit'.
    """
    return f"Weather in {city}: 22 degrees {units}, partly cloudy"

For tools with many parameters or constrained values, type hints are not expressive enough. Attach a Pydantic model as args_schema to give the model field descriptions, valid values, and defaults:

from langchain.tools import tool
from pydantic import BaseModel, Field
from typing import Literal

class WeatherInput(BaseModel):
    location: str = Field(description="City name or coordinates")
    units: Literal["celsius", "fahrenheit"] = Field(
        default="celsius", description="Temperature unit preference")
    include_forecast: bool = Field(
        default=False, description="Whether to include a 5-day forecast")

@tool(args_schema=WeatherInput)
def get_weather(location: str, units: str = "celsius",
                include_forecast: bool = False) -> str:
    """Get current weather and optional forecast for a location."""
    ...

Use	When
Type hints only	Simple tools with one or two obvious parameters.
Pydantic `args_schema`	Many parameters, constrained values (`Literal`), or when the model keeps getting arguments wrong.

A tool is a name, a description, and a typed schema. Write the description as if it is the only documentation the model will ever read, because it is.

Step 3

3. The tool-calling loop, by hand

Before any agent framework hides it, run the loop yourself. This is exactly what happens under the hood.

What is this? Five steps. You send the user's message with tools bound to the model. The model replies with one or more tool_calls instead of a final answer. You execute each requested function and send the results back as tool messages. The model reads them and produces its answer. If it needs more, it calls again.

Why learn it manually? Because tool calling will go wrong, and when it does you need to know which step failed. Did the model pick the wrong tool? Pass bad arguments? Ignore the result? Knowing the loop is how you debug it.

The code

Bind tools to the model, invoke, branch on tool_calls, execute, send results back, invoke once more.

from langchain.chat_models import init_chat_model
from langchain.tools import tool
from langchain.messages import HumanMessage

@tool
def search(query: str) -> str:
    """Search the web for current information."""
    return f"Top result for '{query}': Python 3.13 was released in October 2024."

model = init_chat_model("openai:gpt-5.4")
model_with_tools = model.bind_tools([search])

# Step 1: send the user message, the model decides whether to call a tool
messages = [HumanMessage(content="What's the latest Python version?")]
response = model_with_tools.invoke(messages)
messages.append(response)

# Step 2: did the model ask for tools?
if response.tool_calls:
    for tool_call in response.tool_calls:
        # Step 3: execute the requested function
        result = search.invoke(tool_call)   # returns a ToolMessage
        messages.append(result)
    # Step 4: send results back, the model writes the final answer
    final = model_with_tools.invoke(messages)
    print(final.content)

Live: run the loop

Pick a request. Watch the model emit a tool_call, your runtime execute the tool, the result return as a ToolMessage, and the loop close with an answer. Every JSON payload is the real shape LangChain produces. Deterministic: the same request always traces the same way.

The model never executes anything. It emits a request, your code runs the function, the result re-enters the conversation, and the model continues. That round trip is the whole game.

Step 4

4. The four message types

A conversation is a list of messages, each with a role. Tool calling uses all four.

What is this? Models have no memory. Every call sees only the list of messages you send. Each message has a role that tells the model who spoke: the system sets the rules, the human asks, the model answers, and a tool reports a result. Tool calling is just these four roles in sequence.

The link that makes it work. When the model requests a tool, its message carries a tool_call with a unique id. Your tool result comes back as a ToolMessage carrying the same tool_call_id. That id is how the model matches a result to the request it made, even when several calls are in flight at once.

One tool call, four messages

Read top to bottom. Notice the call_001 id minted by the AI message and echoed by the tool message: that is the thread tying request to result.

System

You are a helpful assistant. Use tools when they help.

Human

What's the weather in Tokyo?

(no text, this turn is a tool request)

tool_calls: [ { "name": "get_weather", "args": { "city": "Tokyo" }, "id": "call_001" } ]

Tool

{ "content": "Tokyo: 22 degrees C, partly cloudy", "tool_call_id": "call_001" }

It is currently 22 degrees C and partly cloudy in Tokyo.

Role	Who	Carries
SystemMessage	You, once, up front	The rules and persona. Shapes every reply.
HumanMessage	The user	The request, as text or content blocks.
AIMessage	The model	Text, and/or `tool_calls` with ids.
ToolMessage	Your runtime	A tool result plus its `tool_call_id`.

Build a list, send it, append the reply, repeat. The tool_call_id is the single thread that keeps requests and results from getting crossed.

Step 5

5. Reading a tool_call

Three fields. Click each one to see what it does.

What is this? When the model wants to act, each request in response.tool_calls is a small dictionary. There are only three fields you need: the tool name, the arguments, and an id. Master this shape and the rest of tool calling is bookkeeping.

A tool_call is a name, an args dict, and an id. Your job is to dispatch on the name, validate the args, run the function, and return a result tagged with the id.

Step 6

6. Parallel tool calls

One turn can request several tools at once. They fan out, run independently, and fan back in.

What is this? Nothing forces a turn to contain a single tool call. When a request needs two independent facts, the model can emit two (or more) tool_calls in one AIMessage. You execute them (in parallel if you like), and return one ToolMessage per call, each tagged with its own id. The model then has every result in hand for its answer.

Why it matters. Independent lookups should not be serialized. If the user asks for the weather in Paris and a calculation, there is no reason to wait for one before starting the other. Parallel calls cut latency, and the ids keep the results straight.

Live: a request that fans out

One AI turn, two tool calls, two results, one answer. Press run and watch the single message branch into independent calls and rejoin.

Parallel calls are still one loop iteration: a single AI turn proposes N actions, N results return, the model reads them together. Disable it with parallel_tool_calls=False when a tool's result must inform the next call.

Step 7

7. Controlling tool use

By default the model decides. Sometimes you want to insist, or forbid.

What is this? When you bind tools, the tool_choice argument controls how freely the model may call them. The default lets the model decide. You can force it to call some tool, force a specific one, or forbid tool calls entirely. This is how you keep a model from answering from memory when a tool is mandatory, or from reaching for a tool when you want a plain reply.

The code

model = init_chat_model("openai:gpt-5.4")

# Default: the model decides whether to use tools
model_with_tools = model.bind_tools([search, calculator])

# Force the model to call at least one tool
model_must_use = model.bind_tools([search, calculator], tool_choice="any")

# Force a specific tool
model_must_search = model.bind_tools([search, calculator], tool_choice="search")

# One tool at a time (no parallel calls)
model_sequential = model.bind_tools([search, calculator], parallel_tool_calls=False)

Live: pick a policy

Same request, same tools. Change tool_choice and see what the model is allowed to do.

Use "auto" for general agents, "any" when a tool is definitely required, a specific name when you know exactly which, and "none" to force a text-only reply.

Step 8

8. Structured output is tool calling in disguise

Force the model to answer as a typed object, not free text. Underneath, it is often a forced tool call.

What is this? Where tools let the model call a function, structured output lets you extract a typed object from the model's answer. You give a schema (a Pydantic model, a TypedDict, or raw JSON Schema) and with_structured_output returns parsed data instead of a text message.

The connection. One of the two strategies LangChain uses is "function calling": it turns your schema into a single fake tool and forces the model to call it. The arguments of that call are your structured object. So structured output and tool calling are the same machinery pointed at two different goals: acting versus extracting.

The code

from langchain.chat_models import init_chat_model
from pydantic import BaseModel, Field

class TicketClassification(BaseModel):
    category: str = Field(description="One of: bug, feature, question, docs")
    priority: int = Field(description="1 (low) to 5 (critical)")
    summary: str  = Field(description="One-sentence summary")

model = init_chat_model("openai:gpt-5.4", temperature=0)
classifier = model.with_structured_output(TicketClassification)

result = classifier.invoke("The app crashes when I click submit on the payment page")
# result is a TicketClassification instance, not a string
print(result.category, result.priority, result.summary)

Free text in, typed object out

The same morph as a tool: a schema constrains the output, and you get a validated object your downstream code can trust.

Input (free text)

"The app crashes when I click
 submit on the payment page"

→

Output (typed object)

{
  "category": "bug",
  "priority": 5,
  "summary": "Crash on submit
     at payment page."
}

Strategy	`method`	How it works
Native JSON schema	`"json_schema"`	The provider constrains generation to match the schema directly. Faster, fewer parse failures (OpenAI, Google).
Function calling	`"function_calling"`	The schema becomes a fake tool the model is forced to call. Works on any provider with tool support.

In production, pass include_raw=True. You get back a dict with parsed, raw, and parsing_error, so a malformed response can be logged or retried instead of crashing. For agents built with create_agent, use response_format instead of with_structured_output.

If you can describe the answer as a schema, ask for the schema. Structured output removes the regex-the-text step that breaks every brittle pipeline.

Step 9

9. When the model hands you bad arguments

Models are not perfect callers. The fix is to make failure legible, not fatal.

What is this? Sometimes the model calls a tool with arguments that do not work: a zero where a divisor is needed, a malformed date, a missing field. The instinct is to raise an exception. Resist it. When a tool raises, the loop either crashes or surfaces a useless generic error. When a tool returns a descriptive error string, the model reads it, reasons about what went wrong, and tries again with better arguments.

The pattern. Validate inside the tool. On bad input, return a clear message that names the problem and the fix. The error becomes just another ToolMessage, and the loop self-corrects.

The code

@tool
def divide(numerator: float, denominator: float) -> str:
    """Divide two numbers.

    Args:
        numerator: The number to divide.
        denominator: The number to divide by. Must not be zero.
    """
    if denominator == 0:
        return "Error: Cannot divide by zero. Provide a non-zero denominator."
    return str(numerator / denominator)

Live: watch the model recover

The model calls divide with a zero denominator. The tool returns an error string instead of raising. The model reads it and retries with a valid argument. Press run.

Return errors, do not raise them. A descriptive error string is a second chance; an exception is a dead end. (For flaky external calls, ToolRetryMiddleware retries automatically.)

Step 10

10. From manual loop to agent

Writing the loop by hand teaches you what happens. An agent writes the loop for you.

What is this? The hand-written loop in Section 3 works, but it is tedious, and it only handles one round. What if the model needs to call a tool, read the result, then call a different tool? An agent runs the loop automatically: reason, call, observe, repeat, until the model produces an answer with no tool calls. You supply the model, the tools, and a system prompt; the agent handles the orchestration.

The loop's control flow is a topic of its own. How it decides, how it stops, how it can run forever if you let it: that is a deep subject in itself. Here we just close the gap from "I call tools by hand" to "the loop runs itself".

The code

from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.tools import tool

@tool
def get_population(country: str) -> str:
    """Get the population of a country."""
    data = {"France": "67 million", "Japan": "125 million"}
    return data.get(country, f"No data for {country}")

@tool
def get_gdp(country: str) -> str:
    """Get the GDP of a country in USD."""
    data = {"France": "$3.05T", "Japan": "$4.41T"}
    return data.get(country, f"No data for {country}")

agent = create_agent(
    model=init_chat_model("openai:gpt-5.4"),
    tools=[get_population, get_gdp],
    prompt="You are an economics analyst. Gather data before answering.",
)

result = agent.invoke(
    {"messages": [{"role": "user", "content": "Compare France and Japan economically"}]}
)
print(result["messages"][-1].content)

How an agent stops

The loop must end. By default it ends when the model answers without calling a tool. Always add a hard limit too.

Stop condition	What triggers it
Natural completion	The model replies with no `tool_calls`. It has decided it can answer.
Structured output	A `response_format` schema is satisfied.
Iteration limit	`ModelCallLimitMiddleware(max_calls=N)` halts a runaway loop. Use it in production, always.
Routing	A tool returns a `Command` that exits the loop.

from langchain.agents.middleware import ModelCallLimitMiddleware

agent = create_agent(
    model=init_chat_model("openai:gpt-5.4"),
    tools=[get_population, get_gdp],
    prompt="You are an economics analyst.",
    before_model=ModelCallLimitMiddleware(max_calls=10),   # never loop forever
)

An agent is the loop from Section 3, automated, with stop conditions bolted on. The control flow that decides when to call a tool and when to stop is where the real subtlety lives.

Step 11

11. Common mistakes

Most tool-calling bugs are description bugs or error-handling bugs.

Mistake	What goes wrong	Fix
Vague docstring	The model is the only reader of the description, and a vague one makes it misuse or skip the tool.	Write specific, action-first descriptions. State what the tool does and what each argument means.
Raising instead of returning errors	An exception crashes the loop or yields a generic failure the model cannot reason about.	Return a descriptive error string. The model reads it and retries.
Orphaned tool result	A `ToolMessage` without the matching `tool_call_id`, the model cannot correlate it to its request.	Always echo the `id` from the originating tool call.
Mega-tools	One tool that does many unrelated jobs confuses the model about when to call it.	One tool, one job. Compose many small focused tools.
Spaces in tool names	Some providers reject names with spaces or special characters.	Use `snake_case` tool names.
No iteration limit	An agent can loop indefinitely, burning tokens and money.	Add `ModelCallLimitMiddleware` in production.
Over-engineered schema	A Pydantic model for a one-argument tool is noise.	Use plain type hints for simple tools; reach for `args_schema` only when the model gets arguments wrong.
Stateful tools via globals	Hidden state through closures or globals makes tools unpredictable and untestable.	Use `ToolRuntime` for state, store, and context.

Write the description for the model, return errors for the model, and keep each tool small. That covers most of the failures.

Step 12

12. End-to-end: tools, a loop, structured output

A complete, runnable example tying the whole essay together.

Three focused tools, an agent that orchestrates the loop, a hard iteration limit, and a typed final report. This is the shape of a real tool-using application.

from langchain.agents import create_agent
from langchain.agents.middleware import ModelCallLimitMiddleware
from langchain.chat_models import init_chat_model
from langchain.tools import tool
from pydantic import BaseModel, Field
from typing import List

# === tools: one job each, clear descriptions ===
@tool
def search(query: str) -> str:
    """Search the web for current information about a topic."""
    return f"Top results for '{query}': ..."

@tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get the current weather for a city. units: 'celsius' or 'fahrenheit'."""
    return f"{city}: 22 degrees {units}, partly cloudy"

@tool
def calculator(expression: str) -> str:
    """Evaluate a simple arithmetic expression, e.g. '18 * 24'."""
    if not all(c in "0123456789+-*/. ()" for c in expression):
        return "Error: expression may only contain numbers and + - * / ( )."
    return str(eval(expression))   # safe: input is validated above

# === a typed final answer ===
class TripBrief(BaseModel):
    city: str
    weather: str = Field(description="One-line current conditions")
    notes: List[str] = Field(description="Any extra facts gathered")

# === the agent: model + tools + prompt + stop limit + response schema ===
agent = create_agent(
    model=init_chat_model("openai:gpt-5.4"),
    tools=[search, get_weather, calculator],
    prompt="You are a travel assistant. Use tools to gather facts before answering.",
    response_format=TripBrief,
    before_model=ModelCallLimitMiddleware(max_calls=8),
)

result = agent.invoke(
    {"messages": [{"role": "user",
                   "content": "What's the weather in Tokyo, and what is 14 * 3?"}]}
)
# the agent runs the loop, gathering facts, then returns its
# final answer shaped as a TripBrief (per response_format)

Small tools, a bounded loop, a typed result. Everything else in agent engineering is an elaboration of this.

Step 13

13. What we left out

Real things, deferred to keep this essay about the core mechanism.

ToolRuntime. Tools can receive a runtime parameter (invisible to the model) for the calling user's context, a persistent key-value store, and a stream writer for progress updates.
Runtime context. create_agent(..., context_schema=...) injects per-run data (user id, role) that tools read via runtime.context, without putting it in the prompt.
Streaming from tools. Long-running tools can emit intermediate status through runtime.stream_writer so the user is not staring at a spinner.
Middleware. Beyond the call limit: retries, model fallback, and PII filtering all hook into the agent before and after each model call.
Rate limiting. Attach an InMemoryRateLimiter to a tool that calls an external API so you do not overwhelm it.
MCP. The Model Context Protocol lets you import whole catalogs of tools from external servers instead of writing each by hand. A topic in its own right.
Retrieval as a tool. A search over your own documents (RAG) is just another tool the model can call. Same loop, a vector store behind the function.
Custom agent state. When tools need to share more than messages, extend AgentState and have tools return Command objects to update it.

Each of these is a deep topic on its own. The biggest one is the loop itself: how an agent runs tools autonomously, decides what to do next, and knows when to stop. That control flow is the natural next thing to learn after the mechanism here.

The mechanism is small: a name, a schema, a call, a result, a loop. Everything above is how production teams make that mechanism safe, fast, and observable.

Tool use,step by step.

1. Why tools?

The loop, in one breath

2. Anatomy of a tool

What you write, and what the model sees

The code

3. The tool-calling loop, by hand

The code

Live: run the loop

4. The four message types

One tool call, four messages

5. Reading a tool_call

6. Parallel tool calls

Live: a request that fans out

7. Controlling tool use

The code

Live: pick a policy

8. Structured output is tool calling in disguise

The code

Free text in, typed object out

9. When the model hands you bad arguments

The code

Live: watch the model recover

10. From manual loop to agent

The code

How an agent stops

11. Common mistakes

12. End-to-end: tools, a loop, structured output

13. What we left out

Tool use,
step by step.