OpenAI Tool Calling - Complete Guide

Overview

OpenAI’s tool calling (formerly “function calling”) allows the AI to request execution of functions/tools and receive results. This document explains how it works in simple terms.


The Basic Flow

User asks question
    ↓
AI decides: "I need to use a tool"
    ↓
AI sends BOTH:
  - A text message (explaining what it's doing)
  - Tool call request(s) (the actual function to execute)
    ↓
Your app executes the tool
    ↓
Your app sends result back
    ↓
AI processes result and responds to user

Message Types in Conversation History

1. System Message (role: “system”)

  • Sets up the AI’s behavior and context
  • Only sent once at the start
  • Example: “You are Shello CLI - an AI-powered terminal assistant…“

2. User Message (role: “user”)

  • What the user types
  • Example: “list all files in this directory”

3. Assistant Message (role: “assistant”)

  • The AI’s response
  • Can contain THREE things:
    • Text content (what the AI says to the user)
    • Tool calls (functions the AI wants to execute)
    • Both at the same time! ← This is the key point

4. Tool Message (role: “tool”)

  • The result from executing a tool
  • Must reference the tool_call_id
  • Your application sends this

Key Concept: Assistant Can Send BOTH Text AND Tool Calls

YES! The AI can send a message with:

  • Content: “Let me check that for you…”
  • Tool calls: [{function: "bash", arguments: {"command": "ls"}}]

This is what you see in your logs:

[9] Role: ASSISTANT
────────────────────────────────────────────────────────────────────────────
🔧 Tool Calls: 1

  • Function: bash
    Call ID: rYEbckb86
    Arguments:
      {
        "command": "python -c \"import json; data = {...}; print(json.dumps(data))\""
      }

Sure! Let me test the `analyze_json` tool with a Python code snippet...

Both are present in the same message!


Real Example from Your Logs

Request 1: User asks to test analyze_json

Message [8] - User:

"okay try to use the analyze_json with some input as python code to test out"

Message [9] - Assistant (with tool call):

{
  "role": "assistant",
  "content": "Sure! Let me test the analyze_json tool... Let's start by creating a Python script to generate JSON:",
  "tool_calls": [
    {
      "id": "rYEbckb86",
      "type": "function",
      "function": {
        "name": "bash",
        "arguments": "{\"command\": \"python -c ...\"}"
      }
    }
  ]
}

Message [10] - Tool Result:

{
  "role": "tool",
  "tool_call_id": "rYEbckb86",
  "content": "{\"success\": true, \"output\": \"{\\\"name\\\": \\\"Shello\\\", ...}\"}"
}

Message [11] - Assistant (with another tool call):

{
  "role": "assistant",
  "content": "Now, let's analyze the JSON structure using the analyze_json tool:",
  "tool_calls": [
    {
      "id": "QqRiXviO9",
      "type": "function",
      "function": {
        "name": "analyze_json",
        "arguments": "{\"command\": \"python -c ...\"}"
      }
    }
  ]
}

Message [12] - Tool Result:

{
  "role": "tool",
  "tool_call_id": "QqRiXviO9",
  "content": "{\"success\": true, \"output\": \"jq path | data type...\"}"
}

When Does AI Send What?

Scenario 1: AI just talks (no tools needed)

{
  "role": "assistant",
  "content": "Hello! How can I help you today?"
}

Scenario 2: AI uses tool WITH explanation

{
  "role": "assistant",
  "content": "Let me check that for you...",
  "tool_calls": [{"function": {"name": "bash", "arguments": "{...}"}}]
}

Scenario 3: AI uses tool WITHOUT explanation

{
  "role": "assistant",
  "content": null,
  "tool_calls": [{"function": {"name": "bash", "arguments": "{...}"}}]
}

Most models (including the one you’re using) prefer Scenario 2 - they explain what they’re doing while making the tool call.


The Complete Conversation Flow

Here’s what happens in your Shello CLI:

1. User: "test analyze_json with python code"
2. AI Response (Message 9):
   - Content: "Sure! Let me test... Let's start by creating a Python script..."
   - Tool Call: bash(command="python -c ...")

3. Your App:
   - Executes: python -c "..."
   - Gets output: {"name": "Shello", ...}

4. Your App Sends (Message 10):
   - Role: tool
   - Tool Call ID: rYEbckb86
   - Content: {"success": true, "output": "..."}

5. AI Response (Message 11):
   - Content: "Now, let's analyze the JSON structure..."
   - Tool Call: analyze_json(command="python -c ...")

6. Your App:
   - Executes: analyze_json tool
   - Gets output: jq paths

7. Your App Sends (Message 12):
   - Role: tool
   - Tool Call ID: QqRiXviO9
   - Content: {"success": true, "output": "jq path | data type..."}

8. AI Final Response:
   - Content: "Here's the analysis! The JSON has these fields..."
   - No tool calls (done!)

Important Rules

1. Tool Call IDs Must Match

  • AI generates a unique ID for each tool call
  • Your tool result MUST reference that exact ID
  • This allows multiple tool calls in parallel

2. Content Can Be Null

  • If content is null but tool_calls exists, AI is just calling tools
  • If both exist, AI is explaining AND calling tools

3. Tool Results Are Always Strings

  • The content field in tool messages must be a string
  • Even if your tool returns JSON, stringify it

4. Conversation History Includes Everything

  • System message
  • All user messages
  • All assistant messages (with tool calls)
  • All tool results
  • This maintains context for the AI

Why This Design?

Q: Why does the AI send text AND tool calls together?

A: It makes the conversation feel natural! The AI can:

  • Explain what it’s about to do
  • Execute the tool
  • Then explain the results

This is better than:

  • Silent tool execution (confusing for users)
  • Explaining after (feels disconnected)

Q: Why separate tool result messages?

A: Because:

  • Tool execution happens OUTSIDE the AI
  • Results come back asynchronously
  • Multiple tools can be called in parallel
  • Each result needs to be matched to its request (via ID)

Debugging Tips

Check Your Logs For:

  1. Assistant messages with tool_calls - What is the AI requesting?
  2. Tool messages with tool_call_id - What results are being sent back?
  3. Matching IDs - Do the IDs match between request and result?
  4. Content field - Is the AI explaining what it’s doing?

Common Issues:

  • Missing tool_call_id: Tool result won’t be matched to request
  • Wrong ID: AI won’t know which tool call this result is for
  • Non-string content: API will reject the message
  • Missing tool result: AI will wait forever (or timeout)

Summary

The key insight: An assistant message can contain BOTH:

  • Human-readable text (content)
  • Machine-executable requests (tool_calls)

This allows the AI to:

  1. Tell the user what it’s doing
  2. Actually do it
  3. Process the results
  4. Explain the outcome

All in a natural, conversational flow!


Streaming with Tool Calls

How Streaming Works

When you enable streaming (stream: true), the API sends the response in chunks instead of all at once. This allows you to display the AI’s response as it’s being generated.

Streaming Response Format

Instead of getting one complete message, you get multiple delta chunks:

// Chunk 1: Role
{
  "choices": [{
    "delta": {
      "role": "assistant"
    }
  }]
}
 
// Chunk 2: Content starts
{
  "choices": [{
    "delta": {
      "content": "Sure! "
    }
  }]
}
 
// Chunk 3: More content
{
  "choices": [{
    "delta": {
      "content": "Let me test "
    }
  }]
}
 
// Chunk 4: Tool call starts
{
  "choices": [{
    "delta": {
      "tool_calls": [{
        "index": 0,
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "bash",
          "arguments": ""
        }
      }]
    }
  }]
}
 
// Chunk 5: Tool arguments (streamed!)
{
  "choices": [{
    "delta": {
      "tool_calls": [{
        "index": 0,
        "function": {
          "arguments": "{\"command\""
        }
      }]
    }
  }]
}
 
// Chunk 6: More arguments
{
  "choices": [{
    "delta": {
      "tool_calls": [{
        "index": 0,
        "function": {
          "arguments": ": \"ls\"}"
        }
      }]
    }
  }]
}
 
// Final chunk
{
  "choices": [{
    "delta": {},
    "finish_reason": "tool_calls"
  }]
}

Key Points About Streaming

1. Content and Tool Calls Can Be Interleaved

The AI might stream:

"Let me check..." → [content chunks]
→ [tool call starts]
→ [tool arguments stream]
→ "that for you." → [more content chunks]

Or it might send all content first, then tool calls:

"Let me check that for you." → [all content]
→ [tool call starts]
→ [tool arguments stream]

2. Tool Arguments Are Streamed Character by Character

The JSON arguments don’t come all at once:

Chunk 1: "{"
Chunk 2: "\"command\""
Chunk 3: ": "
Chunk 4: "\"ls -la\""
Chunk 5: "}"

You must accumulate these chunks to build the complete JSON string!

3. Multiple Tool Calls Can Be Streamed

If the AI calls multiple tools, they’re indexed:

{
  "delta": {
    "tool_calls": [{
      "index": 0,  // First tool call
      "function": {"name": "bash", "arguments": "..."}
    }]
  }
}
 
{
  "delta": {
    "tool_calls": [{
      "index": 1,  // Second tool call
      "function": {"name": "analyze_json", "arguments": "..."}
    }]
  }
}

4. Finish Reasons Tell You What Happened

  • finish_reason: "stop" - AI finished naturally (no more to say)
  • finish_reason: "tool_calls" - AI made tool call(s), waiting for results
  • finish_reason: "length" - Hit token limit
  • finish_reason: null - Still streaming (not done yet)

Accumulating Streamed Data

You need to build up the complete message from chunks:

# Initialize accumulators
content = ""
tool_calls = {}  # Dictionary indexed by tool call index
 
for chunk in stream:
    delta = chunk['choices'][0]['delta']
 
    # Accumulate content
    if 'content' in delta and delta['content']:
        content += delta['content']
        print(delta['content'], end='', flush=True)  # Display as it arrives
 
    # Accumulate tool calls
    if 'tool_calls' in delta:
        for tc_delta in delta['tool_calls']:
            index = tc_delta['index']
 
            # Initialize this tool call if first time seeing it
            if index not in tool_calls:
                tool_calls[index] = {
                    'id': tc_delta.get('id', ''),
                    'type': tc_delta.get('type', 'function'),
                    'function': {
                        'name': tc_delta.get('function', {}).get('name', ''),
                        'arguments': ''
                    }
                }
 
            # Accumulate function arguments
            if 'function' in tc_delta and 'arguments' in tc_delta['function']:
                tool_calls[index]['function']['arguments'] += tc_delta['function']['arguments']
 
    # Check if done
    finish_reason = chunk['choices'][0].get('finish_reason')
    if finish_reason:
        break
 
# Now you have:
# - content: Complete text message
# - tool_calls: Complete tool call objects with full JSON arguments

Real Example from Your Logs

When you see this in your terminal:

Sure! Let me test the `analyze_json` tool with a Python code snippet...
┌─[💻 mapar@Omputer]─[C:\REPO\shello-cli-python]
└─$ python -c "import json; data = {...}"

Here’s what actually happened behind the scenes:

Streaming chunks received:

  1. "Sure! "
  2. "Let me "
  3. "test the "
  4. `analyze_json`
  5. " tool..."
  6. Tool call starts: {"index": 0, "id": "rYEbckb86", "function": {"name": "bash"}}
  7. Arguments chunk: "{\"command\""
  8. Arguments chunk: ": \"python"
  9. Arguments chunk: -c ..."
  10. Arguments chunk: "}"
  11. Finish reason: "tool_calls"

Your app accumulated all chunks into:

  • Content: "Sure! Let me test the analyze_json tool..."
  • Tool call: {id: "rYEbckb86", function: {name: "bash", arguments: "{\"command\": \"python -c ...\"}"}}

Why Stream Tool Calls?

Q: Why not just send the complete tool call at once?

A: Consistency and flexibility!

  • Same streaming mechanism for everything
  • Allows for very long tool arguments (e.g., large JSON payloads)
  • You can start processing as soon as you have enough data
  • Better user experience (shows progress)

Common Streaming Pitfalls

1. Incomplete JSON Arguments

# ❌ BAD: Using arguments before streaming is complete
for chunk in stream:
    if 'tool_calls' in chunk['delta']:
        args = chunk['delta']['tool_calls'][0]['function']['arguments']
        json.loads(args)  # ERROR! Incomplete JSON!
 
# ✅ GOOD: Accumulate first, parse after
accumulated_args = ""
for chunk in stream:
    if 'tool_calls' in chunk['delta']:
        accumulated_args += chunk['delta']['tool_calls'][0]['function']['arguments']
    if chunk['choices'][0].get('finish_reason'):
        parsed_args = json.loads(accumulated_args)  # Now it's complete!

2. Not Handling Multiple Tool Calls

# ❌ BAD: Assuming only one tool call
tool_call = {'arguments': ''}
for chunk in stream:
    if 'tool_calls' in chunk['delta']:
        tool_call['arguments'] += chunk['delta']['tool_calls'][0]['function']['arguments']
 
# ✅ GOOD: Track by index
tool_calls = {}
for chunk in stream:
    if 'tool_calls' in chunk['delta']:
        for tc in chunk['delta']['tool_calls']:
            index = tc['index']
            if index not in tool_calls:
                tool_calls[index] = {'arguments': ''}
            tool_calls[index]['arguments'] += tc['function']['arguments']

3. Displaying Tool Calls to User

# ❌ BAD: Showing raw JSON chunks
for chunk in stream:
    if 'tool_calls' in chunk['delta']:
        print(chunk['delta']['tool_calls'][0]['function']['arguments'])
        # Output: {"command"
        #         : "ls -la"
        #         }
        # Looks broken!
 
# ✅ GOOD: Wait until complete, then display nicely
# Accumulate silently, then show formatted output
if finish_reason == 'tool_calls':
    for tc in tool_calls.values():
        print(f"🔧 Calling {tc['function']['name']}({tc['function']['arguments']})")

Streaming Flow Diagram

User: "list files"
    ↓
[Stream starts]
    ↓
Chunk: {"delta": {"content": "Let "}}
    → Display: "Let "
    ↓
Chunk: {"delta": {"content": "me check..."}}
    → Display: "me check..."
    ↓
Chunk: {"delta": {"tool_calls": [{"index": 0, "id": "abc", "function": {"name": "bash"}}]}}
    → Store: tool_calls[0] = {id: "abc", name: "bash", args: ""}
    ↓
Chunk: {"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "{\"command\""}}]}}
    → Accumulate: tool_calls[0].args += "{\"command\""
    ↓
Chunk: {"delta": {"tool_calls": [{"index": 0, "function": {"arguments": ": \"ls\"}"}}]}}
    → Accumulate: tool_calls[0].args += ": \"ls\"}"
    ↓
Chunk: {"finish_reason": "tool_calls"}
    → Parse: json.loads(tool_calls[0].args) = {"command": "ls"}
    → Execute: bash("ls")
    ↓
[Stream ends]

Summary

Streaming with tool calls means:

  1. Content is streamed word-by-word (or token-by-token)
  2. Tool calls are streamed piece-by-piece
  3. Tool arguments (JSON) are streamed character-by-character
  4. You must accumulate chunks before parsing
  5. Multiple tool calls are tracked by index
  6. finish_reason tells you when streaming is complete

The benefit: Users see progress in real-time, even when the AI is preparing to call tools!