Tool use,
step by step.
A model can reason, but it cannot act. Tools close that gap. This essay walks the whole mechanism: how you define a tool, what the model actually sees, the call-and-respond loop, the four message types that carry it, parallel calls, structured output, and what happens when the model hands you broken JSON.
1. Why tools?
A language model predicts text. It has no hands. Tools are the hands.
What is this? A tool is an ordinary function you make available to the model: check the weather, query a database, run a calculation, search the web. The model never runs the function itself. Instead it decides when to call it and with what arguments, emits that request as structured data, and waits. Your code runs the function and hands the result back. The model reads the result and continues.
Why do we need it? A model's only output is tokens. It can write "I should check the weather in Tokyo," but it cannot reach a weather API. It can describe a SQL query but cannot execute it. Tools turn a text predictor into something that can affect the world and read fresh facts back. This is the single mechanism underneath every agent.
The loop, in one breath
Four beats. The model thinks, asks to call a tool, your code executes it, the model reads the result. Repeat until the model has nothing left to ask.
tool_call (a name plus arguments).
execute: your code runs the real function.
observe: the result goes back to the model as a message.
The model stops calling tools when it decides it can answer.
The model proposes; your code disposes. The model chooses the action, but execution always happens in your runtime, never inside the model.
2. Anatomy of a tool
The model has never seen your code. All it gets is a name, a description, and an argument schema.
What is this? You write a normal function. The framework turns it into a tool the model can see. In LangChain, the @tool decorator does this automatically: the function name becomes the tool name, the docstring becomes the description, and the type hints become the argument schema.
Why does the docstring matter so much? Because it is the entire user manual the model gets. The model cannot read your function body. A vague description ("does weather stuff") makes the model misuse or avoid the tool. A precise one directly raises tool-calling accuracy. The description is not a comment; it is part of the prompt.
What you write, and what the model sees
Left: the Python you write. Right: the JSON-shaped schema the framework extracts and shows the model. Name, description, and typed arguments, nothing else.
@tool def get_weather( city: str, units: str = "celsius", ) -> str: """Current weather for a city.""" # real work here return result
{
"name": "get_weather",
"description":
"Current weather for a city",
"parameters": {
"city": "string, required",
"units": "string = celsius"
}
}
The code
The @tool decorator. Name from the function, description from the docstring, schema from the type hints.
from langchain.tools import tool
@tool
def get_weather(city: str, units: str = "celsius") -> str:
"""Get the current weather for a city.
Args:
city: The city name to look up weather for.
units: Temperature units, 'celsius' or 'fahrenheit'.
"""
return f"Weather in {city}: 22 degrees {units}, partly cloudy"
For tools with many parameters or constrained values, type hints are not expressive enough. Attach a Pydantic model as args_schema to give the model field descriptions, valid values, and defaults:
from langchain.tools import tool
from pydantic import BaseModel, Field
from typing import Literal
class WeatherInput(BaseModel):
location: str = Field(description="City name or coordinates")
units: Literal["celsius", "fahrenheit"] = Field(
default="celsius", description="Temperature unit preference")
include_forecast: bool = Field(
default=False, description="Whether to include a 5-day forecast")
@tool(args_schema=WeatherInput)
def get_weather(location: str, units: str = "celsius",
include_forecast: bool = False) -> str:
"""Get current weather and optional forecast for a location."""
...
| Use | When |
|---|---|
| Type hints only | Simple tools with one or two obvious parameters. |
Pydantic args_schema | Many parameters, constrained values (Literal), or when the model keeps getting arguments wrong. |
A tool is a name, a description, and a typed schema. Write the description as if it is the only documentation the model will ever read, because it is.
3. The tool-calling loop, by hand
Before any agent framework hides it, run the loop yourself. This is exactly what happens under the hood.
What is this? Five steps. You send the user's message with tools bound to the model. The model replies with one or more tool_calls instead of a final answer. You execute each requested function and send the results back as tool messages. The model reads them and produces its answer. If it needs more, it calls again.
Why learn it manually? Because tool calling will go wrong, and when it does you need to know which step failed. Did the model pick the wrong tool? Pass bad arguments? Ignore the result? Knowing the loop is how you debug it.
The code
Bind tools to the model, invoke, branch on tool_calls, execute, send results back, invoke once more.
from langchain.chat_models import init_chat_model
from langchain.tools import tool
from langchain.messages import HumanMessage
@tool
def search(query: str) -> str:
"""Search the web for current information."""
return f"Top result for '{query}': Python 3.13 was released in October 2024."
model = init_chat_model("openai:gpt-5.4")
model_with_tools = model.bind_tools([search])
# Step 1: send the user message, the model decides whether to call a tool
messages = [HumanMessage(content="What's the latest Python version?")]
response = model_with_tools.invoke(messages)
messages.append(response)
# Step 2: did the model ask for tools?
if response.tool_calls:
for tool_call in response.tool_calls:
# Step 3: execute the requested function
result = search.invoke(tool_call) # returns a ToolMessage
messages.append(result)
# Step 4: send results back, the model writes the final answer
final = model_with_tools.invoke(messages)
print(final.content)
Live: run the loop
Pick a request. Watch the model emit a tool_call, your runtime execute the tool, the result return as a ToolMessage, and the loop close with an answer. Every JSON payload is the real shape LangChain produces. Deterministic: the same request always traces the same way.
The model never executes anything. It emits a request, your code runs the function, the result re-enters the conversation, and the model continues. That round trip is the whole game.
4. The four message types
A conversation is a list of messages, each with a role. Tool calling uses all four.
What is this? Models have no memory. Every call sees only the list of messages you send. Each message has a role that tells the model who spoke: the system sets the rules, the human asks, the model answers, and a tool reports a result. Tool calling is just these four roles in sequence.
The link that makes it work. When the model requests a tool, its message carries a tool_call with a unique id. Your tool result comes back as a ToolMessage carrying the same tool_call_id. That id is how the model matches a result to the request it made, even when several calls are in flight at once.
One tool call, four messages
Read top to bottom. Notice the call_001 id minted by the AI message and echoed by the tool message: that is the thread tying request to result.
| Role | Who | Carries |
|---|---|---|
| SystemMessage | You, once, up front | The rules and persona. Shapes every reply. |
| HumanMessage | The user | The request, as text or content blocks. |
| AIMessage | The model | Text, and/or tool_calls with ids. |
| ToolMessage | Your runtime | A tool result plus its tool_call_id. |
Build a list, send it, append the reply, repeat. The tool_call_id is the single thread that keeps requests and results from getting crossed.
5. Reading a tool_call
Three fields. Click each one to see what it does.
What is this? When the model wants to act, each request in response.tool_calls is a small dictionary. There are only three fields you need: the tool name, the arguments, and an id. Master this shape and the rest of tool calling is bookkeeping.
A tool_call is a name, an args dict, and an id. Your job is to dispatch on the name, validate the args, run the function, and return a result tagged with the id.
6. Parallel tool calls
One turn can request several tools at once. They fan out, run independently, and fan back in.
What is this? Nothing forces a turn to contain a single tool call. When a request needs two independent facts, the model can emit two (or more) tool_calls in one AIMessage. You execute them (in parallel if you like), and return one ToolMessage per call, each tagged with its own id. The model then has every result in hand for its answer.
Why it matters. Independent lookups should not be serialized. If the user asks for the weather in Paris and a calculation, there is no reason to wait for one before starting the other. Parallel calls cut latency, and the ids keep the results straight.
Live: a request that fans out
One AI turn, two tool calls, two results, one answer. Press run and watch the single message branch into independent calls and rejoin.
Parallel calls are still one loop iteration: a single AI turn proposes N actions, N results return, the model reads them together. Disable it with parallel_tool_calls=False when a tool's result must inform the next call.
7. Controlling tool use
By default the model decides. Sometimes you want to insist, or forbid.
What is this? When you bind tools, the tool_choice argument controls how freely the model may call them. The default lets the model decide. You can force it to call some tool, force a specific one, or forbid tool calls entirely. This is how you keep a model from answering from memory when a tool is mandatory, or from reaching for a tool when you want a plain reply.
The code
model = init_chat_model("openai:gpt-5.4")
# Default: the model decides whether to use tools
model_with_tools = model.bind_tools([search, calculator])
# Force the model to call at least one tool
model_must_use = model.bind_tools([search, calculator], tool_choice="any")
# Force a specific tool
model_must_search = model.bind_tools([search, calculator], tool_choice="search")
# One tool at a time (no parallel calls)
model_sequential = model.bind_tools([search, calculator], parallel_tool_calls=False)
Live: pick a policy
Same request, same tools. Change tool_choice and see what the model is allowed to do.
Use "auto" for general agents, "any" when a tool is definitely required, a specific name when you know exactly which, and "none" to force a text-only reply.
8. Structured output is tool calling in disguise
Force the model to answer as a typed object, not free text. Underneath, it is often a forced tool call.
What is this? Where tools let the model call a function, structured output lets you extract a typed object from the model's answer. You give a schema (a Pydantic model, a TypedDict, or raw JSON Schema) and with_structured_output returns parsed data instead of a text message.
The connection. One of the two strategies LangChain uses is "function calling": it turns your schema into a single fake tool and forces the model to call it. The arguments of that call are your structured object. So structured output and tool calling are the same machinery pointed at two different goals: acting versus extracting.
The code
from langchain.chat_models import init_chat_model
from pydantic import BaseModel, Field
class TicketClassification(BaseModel):
category: str = Field(description="One of: bug, feature, question, docs")
priority: int = Field(description="1 (low) to 5 (critical)")
summary: str = Field(description="One-sentence summary")
model = init_chat_model("openai:gpt-5.4", temperature=0)
classifier = model.with_structured_output(TicketClassification)
result = classifier.invoke("The app crashes when I click submit on the payment page")
# result is a TicketClassification instance, not a string
print(result.category, result.priority, result.summary)
Free text in, typed object out
The same morph as a tool: a schema constrains the output, and you get a validated object your downstream code can trust.
"The app crashes when I click submit on the payment page"
{
"category": "bug",
"priority": 5,
"summary": "Crash on submit
at payment page."
}
| Strategy | method | How it works |
|---|---|---|
| Native JSON schema | "json_schema" | The provider constrains generation to match the schema directly. Faster, fewer parse failures (OpenAI, Google). |
| Function calling | "function_calling" | The schema becomes a fake tool the model is forced to call. Works on any provider with tool support. |
include_raw=True. You get back a dict with parsed, raw, and parsing_error, so a malformed response can be logged or retried instead of crashing. For agents built with create_agent, use response_format instead of with_structured_output.If you can describe the answer as a schema, ask for the schema. Structured output removes the regex-the-text step that breaks every brittle pipeline.
9. When the model hands you bad arguments
Models are not perfect callers. The fix is to make failure legible, not fatal.
What is this? Sometimes the model calls a tool with arguments that do not work: a zero where a divisor is needed, a malformed date, a missing field. The instinct is to raise an exception. Resist it. When a tool raises, the loop either crashes or surfaces a useless generic error. When a tool returns a descriptive error string, the model reads it, reasons about what went wrong, and tries again with better arguments.
The pattern. Validate inside the tool. On bad input, return a clear message that names the problem and the fix. The error becomes just another ToolMessage, and the loop self-corrects.
The code
@tool
def divide(numerator: float, denominator: float) -> str:
"""Divide two numbers.
Args:
numerator: The number to divide.
denominator: The number to divide by. Must not be zero.
"""
if denominator == 0:
return "Error: Cannot divide by zero. Provide a non-zero denominator."
return str(numerator / denominator)
Live: watch the model recover
The model calls divide with a zero denominator. The tool returns an error string instead of raising. The model reads it and retries with a valid argument. Press run.
Return errors, do not raise them. A descriptive error string is a second chance; an exception is a dead end. (For flaky external calls, ToolRetryMiddleware retries automatically.)
10. From manual loop to agent
Writing the loop by hand teaches you what happens. An agent writes the loop for you.
What is this? The hand-written loop in Section 3 works, but it is tedious, and it only handles one round. What if the model needs to call a tool, read the result, then call a different tool? An agent runs the loop automatically: reason, call, observe, repeat, until the model produces an answer with no tool calls. You supply the model, the tools, and a system prompt; the agent handles the orchestration.
The loop's control flow is a topic of its own. How it decides, how it stops, how it can run forever if you let it: that is a deep subject in itself. Here we just close the gap from "I call tools by hand" to "the loop runs itself".
The code
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.tools import tool
@tool
def get_population(country: str) -> str:
"""Get the population of a country."""
data = {"France": "67 million", "Japan": "125 million"}
return data.get(country, f"No data for {country}")
@tool
def get_gdp(country: str) -> str:
"""Get the GDP of a country in USD."""
data = {"France": "$3.05T", "Japan": "$4.41T"}
return data.get(country, f"No data for {country}")
agent = create_agent(
model=init_chat_model("openai:gpt-5.4"),
tools=[get_population, get_gdp],
prompt="You are an economics analyst. Gather data before answering.",
)
result = agent.invoke(
{"messages": [{"role": "user", "content": "Compare France and Japan economically"}]}
)
print(result["messages"][-1].content)
How an agent stops
The loop must end. By default it ends when the model answers without calling a tool. Always add a hard limit too.
| Stop condition | What triggers it |
|---|---|
| Natural completion | The model replies with no tool_calls. It has decided it can answer. |
| Structured output | A response_format schema is satisfied. |
| Iteration limit | ModelCallLimitMiddleware(max_calls=N) halts a runaway loop. Use it in production, always. |
| Routing | A tool returns a Command that exits the loop. |
from langchain.agents.middleware import ModelCallLimitMiddleware
agent = create_agent(
model=init_chat_model("openai:gpt-5.4"),
tools=[get_population, get_gdp],
prompt="You are an economics analyst.",
before_model=ModelCallLimitMiddleware(max_calls=10), # never loop forever
)
An agent is the loop from Section 3, automated, with stop conditions bolted on. The control flow that decides when to call a tool and when to stop is where the real subtlety lives.
11. Common mistakes
Most tool-calling bugs are description bugs or error-handling bugs.
| Mistake | What goes wrong | Fix |
|---|---|---|
| Vague docstring | The model is the only reader of the description, and a vague one makes it misuse or skip the tool. | Write specific, action-first descriptions. State what the tool does and what each argument means. |
| Raising instead of returning errors | An exception crashes the loop or yields a generic failure the model cannot reason about. | Return a descriptive error string. The model reads it and retries. |
| Orphaned tool result | A ToolMessage without the matching tool_call_id, the model cannot correlate it to its request. | Always echo the id from the originating tool call. |
| Mega-tools | One tool that does many unrelated jobs confuses the model about when to call it. | One tool, one job. Compose many small focused tools. |
| Spaces in tool names | Some providers reject names with spaces or special characters. | Use snake_case tool names. |
| No iteration limit | An agent can loop indefinitely, burning tokens and money. | Add ModelCallLimitMiddleware in production. |
| Over-engineered schema | A Pydantic model for a one-argument tool is noise. | Use plain type hints for simple tools; reach for args_schema only when the model gets arguments wrong. |
| Stateful tools via globals | Hidden state through closures or globals makes tools unpredictable and untestable. | Use ToolRuntime for state, store, and context. |
Write the description for the model, return errors for the model, and keep each tool small. That covers most of the failures.
12. End-to-end: tools, a loop, structured output
A complete, runnable example tying the whole essay together.
Three focused tools, an agent that orchestrates the loop, a hard iteration limit, and a typed final report. This is the shape of a real tool-using application.
from langchain.agents import create_agent
from langchain.agents.middleware import ModelCallLimitMiddleware
from langchain.chat_models import init_chat_model
from langchain.tools import tool
from pydantic import BaseModel, Field
from typing import List
# === tools: one job each, clear descriptions ===
@tool
def search(query: str) -> str:
"""Search the web for current information about a topic."""
return f"Top results for '{query}': ..."
@tool
def get_weather(city: str, units: str = "celsius") -> str:
"""Get the current weather for a city. units: 'celsius' or 'fahrenheit'."""
return f"{city}: 22 degrees {units}, partly cloudy"
@tool
def calculator(expression: str) -> str:
"""Evaluate a simple arithmetic expression, e.g. '18 * 24'."""
if not all(c in "0123456789+-*/. ()" for c in expression):
return "Error: expression may only contain numbers and + - * / ( )."
return str(eval(expression)) # safe: input is validated above
# === a typed final answer ===
class TripBrief(BaseModel):
city: str
weather: str = Field(description="One-line current conditions")
notes: List[str] = Field(description="Any extra facts gathered")
# === the agent: model + tools + prompt + stop limit + response schema ===
agent = create_agent(
model=init_chat_model("openai:gpt-5.4"),
tools=[search, get_weather, calculator],
prompt="You are a travel assistant. Use tools to gather facts before answering.",
response_format=TripBrief,
before_model=ModelCallLimitMiddleware(max_calls=8),
)
result = agent.invoke(
{"messages": [{"role": "user",
"content": "What's the weather in Tokyo, and what is 14 * 3?"}]}
)
# the agent runs the loop, gathering facts, then returns its
# final answer shaped as a TripBrief (per response_format)
Small tools, a bounded loop, a typed result. Everything else in agent engineering is an elaboration of this.
13. What we left out
Real things, deferred to keep this essay about the core mechanism.
- ToolRuntime. Tools can receive a
runtimeparameter (invisible to the model) for the calling user's context, a persistent key-value store, and a stream writer for progress updates. - Runtime context.
create_agent(..., context_schema=...)injects per-run data (user id, role) that tools read viaruntime.context, without putting it in the prompt. - Streaming from tools. Long-running tools can emit intermediate status through
runtime.stream_writerso the user is not staring at a spinner. - Middleware. Beyond the call limit: retries, model fallback, and PII filtering all hook into the agent before and after each model call.
- Rate limiting. Attach an
InMemoryRateLimiterto a tool that calls an external API so you do not overwhelm it. - MCP. The Model Context Protocol lets you import whole catalogs of tools from external servers instead of writing each by hand. A topic in its own right.
- Retrieval as a tool. A search over your own documents (RAG) is just another tool the model can call. Same loop, a vector store behind the function.
- Custom agent state. When tools need to share more than messages, extend
AgentStateand have tools returnCommandobjects to update it.
Each of these is a deep topic on its own. The biggest one is the loop itself: how an agent runs tools autonomously, decides what to do next, and knows when to stop. That control flow is the natural next thing to learn after the mechanism here.
The mechanism is small: a name, a schema, a call, a result, a loop. Everything above is how production teams make that mechanism safe, fast, and observable.