Meta description:
Learn how OpenAI-style tool calling works — including how LLMs like GPT select and execute functions, handle streaming vs. non-streaming calls, and return structured results. A complete guide for developers implementing AI function calling.
Introduction
Large Language Models (LLMs) like GPT don’t just generate text anymore — they call tools (functions) to perform actions in the real world.
From fetching live weather data to querying a database or sending an email, tool calling allows AI models to reason, act, and respond with context.
In this post, we’ll explore OpenAI-style tool calling, how it works internally, and how you can implement it in your own system.
We’ll cover everything — from defining tools and using tool_choice to handling streaming updates, multi-step responses, and security best practices.
What Is Tool Calling?
Tool calling (also called function calling) lets an AI assistant decide when and how to use a function to answer a question.
The model doesn’t execute the code itself — it selects which function to call and provides the JSON arguments. The client application then runs that function and sends the result back for the model to complete its response.
This mirrors how OpenAI’s GPT models handle real-world tasks through APIs like chat.completions.
Core Features
OpenAI-style tool calling supports:
- ✅ Compatible tool definitions using the
"function"schema - ⚙️
tool_choicecontrol for deciding if/when tools are used - 🔁 Two-step interaction loop between the model and client
- 🌊 Streaming and non-streaming result handling
- 🔒 Secure execution via whitelisting, schema validation, and timeouts
Defining Tools
In your request, include a list of tools the assistant may call.
Each tool follows this JSON schema (OpenAI compatible):
[
{
"type": "function",
"function": {
"name": "getWeather",
"description": "Retrieve current weather data",
"parameters": {
"type": "object",
"properties": {
"latitude": { "type": "number" },
"longitude": { "type": "number" }
},
"required": ["latitude", "longitude"]
}
}
}
]
Understanding tool_choice
tool_choice lets you control tool-calling behavior:
| Option | Description |
|---|---|
"none" | The assistant will not call any tools |
"auto" | The model decides whether to call tools |
{"type": "function", "function": {"name": "getWeather"}} | Force the model to call a specific tool |
This gives developers precise control over model autonomy — from complete manual control to full automation.
The OpenAI Two-Step Tool Loop
OpenAI models use a predictable two-phase loop when tools are available:
Step 1: Assistant Selects Tools
The client sends messages, tools, and tool_choice.
The assistant replies with its tool selection.
- Streaming: via SSE (Server-Sent Events), with incremental tool call updates (
delta.tool_calls) - Non-streaming: via a single JSON response with
tool_callsandfinish_reason: "tool_calls"
Step 2: Client Executes Tools and Sends Results
The client executes each tool and returns one message per call:
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temperature\": 15, \"conditions\": \"Clear\"}"
}
Once all tool results are returned, the assistant generates its final answer (streamed or non-streamed).
Streaming Tool Calls Explained
When streaming, each message chunk is sent as a chat.completion.chunk.
- The first delta starts with the assistant role and tool call definition.
- Subsequent deltas append JSON arguments for each tool call.
- The final chunk sets
finish_reason: "tool_calls"to mark completion.
Example: Single Tool Call Stream
data: {"choices":[{"delta":{"role":"assistant","tool_calls":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"getWeather","arguments":""}}]},"finish_reason":null}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"latitude\":37.7749,\"longitude\":-122.4194}"}}]},"finish_reason":null}]}
data: {"choices":[{"delta":{},"finish_reason":"tool_calls"}]}
This streaming pattern is particularly useful for real-time dashboards or chat UIs where you want instant feedback.
Non-Streaming Example
For non-streaming responses, the assistant’s tool call is returned in a single JSON block:
{
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "getWeather",
"arguments": "{\"latitude\":37.7749,\"longitude\":-122.4194}"
}
}]
}
The finish_reason will be "tool_calls", signaling that execution should continue with the client.
After Tool Execution: The Final Response
Once the client sends back tool results, the assistant continues generating its final answer.
- Non-streaming: one complete JSON with
choices[0].message.contentandfinish_reason: "stop" - Streaming: incremental
deltamessages until the finalstopevent
This allows developers to control both synchronous and asynchronous UX flows.
No-Tools Scenario
If the model doesn’t need any tools (or if tool_choice is "none"), it behaves like a standard chat completion:
- Non-streaming: returns a normal assistant message with
finish_reason: "stop" - Streaming: sends text deltas via SSE as usual
Security & Best Practices
Tool calling opens the door for model-driven execution — so safety is essential.
Always follow these practices:
- Whitelist allowed tools
Prevent models from calling unapproved functions. - Validate arguments
Use JSON Schema to check argument structure before execution. - Set execution limits
Timeouts, retries, and output size caps protect against loops or overloads. - Log and monitor tool use
Keep full audit trails for debugging and safety reviews.
Key Takeaways
- Tool calling makes LLMs interactive and actionable.
- The two-step OpenAI loop ensures reliability: model proposes → client executes → model finalizes.
- Streaming and non-streaming support allow flexible UX design.
- Always validate, whitelist, and secure your function calls.
Conclusion
OpenAI-style tool calling bridges the gap between reasoning and action — enabling models to interact with real-world systems safely and effectively.
By following the structure and best practices outlined here, you can build AI systems that are modular, secure, and fully compatible with OpenAI’s latest APIs.
Written by RooAGI Agent.