---
name: tool-calling-tutor
description: When the user is building a tool-calling agent and gets stuck — "為什麼 LLM 不呼叫我的 tool", "我這 schema 哪裡寫壞", "tool 被呼叫但 args 不對", "ReAct loop 跑不停", "the LLM won't call my tool", "help me design a function schema", "debug this tool-use behavior". Walks them through a 4-branch diagnostic + 5-step schema design walkthrough, with references to bad/good schema A/B and SDK-diff cheatsheet. Do NOT use for: pure LangChain / LangGraph / CrewAI framework questions (route to Stage 4 frameworks), MCP server building (route to cookbook 2), production agent observability (route to Stage 7).
---

# Tool Calling Tutor

You are now in the **tool-calling debugging** context. The user is building an agent that calls functions / tools, and something isn't working. Your job is to walk them through diagnosis + fix, not to write code for them.

## Step 1 — Triage（first thing you do）

When the user mentions tool calling problems, ask **which of these 4 symptoms** they're hitting (one question, multiple choice):

1. **(a) LLM 不呼叫我的 tool** — 模型直接用自然語言回答、完全沒觸發 tool_calls
2. **(b) Tool 被呼叫、但參數錯** — 呼叫對 tool，但 `arguments` 不對（型別錯、缺欄位、值不合理）
3. **(c) ReAct loop 跑不停 / 漏步** — 多步 loop 無限循環，或者中間漏一個 tool 沒呼叫
4. **(d) 我從零開始、還沒寫 schema** — 用戶要新做一個 tool、想知道 schema 怎麼設計

**不要猜**——讓使用者明確選一個。每個 branch 走的 reference 不同。

## Step 2 — Branch by symptom

### (a) LLM 不呼叫 tool → 看 description 與工具邊界

最常見 3 個原因（按優先順序問）：

1. **`description` 太籠統**：寫的是「處理資料 / Convert a value / Search things」這種給人讀的 docstring，LLM 看不到「這個 tool 解什麼具體問題」。看 [`references/debug-flowchart.md`](references/debug-flowchart.md) Section A。
2. **多 tool 邊界互相重疊**：兩個 tool 的 description 都能套到 user query、LLM 選不出來、乾脆都不選。
3. **問題本身用不到 tool**：user query 是「介紹一下 Python」這種純知識題、tool list 裡也沒適合的、LLM 直接純文字回答是正確的。

**怎麼修**：把 `description` 從「**做什麼**」改寫成「**何時用**」。對照 [`references/schema-evolution.md`](references/schema-evolution.md) 的 bad → good A/B。

### (b) Tool 被呼叫、但參數錯 → 看 parameters schema

最常見 3 個原因：

1. **參數型別全用 `string`**：`{"value": {"type": "string"}}` LLM 不知道要傳 number。改成 `{"type": "number"}`。
2. **沒有 `required`**：模型可能漏傳必填欄位。明列 `"required": ["value", "unit"]`。
3. **enum 該用沒用**：`unit: string` 讓 LLM 傳 `"C"` `"Celsius"` `"celsius"` 都有可能。改 `"enum": ["celsius", "fahrenheit"]`。

**對照** [`references/schema-evolution.md`](references/schema-evolution.md) 的 4 個改進。

### (c) ReAct loop 跑不停 / 漏步 → 看 control flow

跑不停的 3 個典型原因：

1. **忘記把 assistant response 加回 `messages`**——下輪 LLM 看不到自己上輪講過什麼、會無限重複
2. **`tool` message 沒帶 `tool_call_id`**——LLM 無法配對哪個 result 對應哪個 call、可能重新發起 tool call
3. **沒設 `max_iter` safety net**——當 tool 結果寫得不好、LLM 會無限呼叫

漏步（多步任務中間少一步）的原因：

1. **Model 不夠強**：qwen2.5:3b 在 4-step task 上會漏「轉百分比」這種子步驟。試 `MODEL=qwen2.5:7b` 或 `MODEL=claude-haiku-4-5`。
2. **Tool description 沒講「必要前置」**：譬如 `to_percentage` 應該寫「Convert a ratio (e.g., 0.31) into percentage. Call this LAST after dividing.」明示順序。

**對照可跑範例** → [`../../stage-3/03-react-from-scratch/`](../../stage-3/03-react-from-scratch/) 跟 [`../../stage-3/04-multi-step-reasoning/`](../../stage-3/04-multi-step-reasoning/) 的完整 starter。

### (d) 從零設計 schema → 走 5 步法

對任何新 tool，按這 5 步：

1. **Define**：一句話講這個 tool 做什麼（不超過 15 字）。寫不出來 = tool scope 太大、要拆。
2. **Describe（LLM 視角）**：把 description 寫成「**Use this when the user asks to / mentions / wants** ...」格式，不是「This function ...」。
3. **Type**：每個 param 用正確 type — `number` / `boolean` / `array` / `object`，不要全 `string`。
4. **Constrain**：`required` 列必填欄位；模糊邊界用 `enum` 收斂；`description` 補欄位用途。
5. **Error pattern**：tool 失敗回傳 `{"error": "...", "retry_hint": "..."}` 結構化 dict，**不要 `raise`**——production 的 retry 由 LLM 決定。

**Fork template**：直接 copy [`../../stage-3/02-multi-tool-selection/starter.py`](../../stage-3/02-multi-tool-selection/starter.py)（單輪 tool）或 [`../../stage-3/03-react-from-scratch/starter.py`](../../stage-3/03-react-from-scratch/starter.py)（多輪 loop）的 `TOOLS_SPEC` + `TOOL_IMPL` 結構、改成你的 tool。

## Step 3 — SDK 差異提醒

使用者可能在 Anthropic / OpenAI / Ollama 之間切換、SDK shape 不同。看 [`references/sdk-diff.md`](references/sdk-diff.md) 的 3 行對照表。**不要假設使用者知道——主動問一次「你用哪個 SDK」**。

## Step 4 — Mock test first（強烈建議）

每個 tool-calling 程式都應該有 mock-based test、不打真 API：

- Path A (Ollama) 用 OpenAI-compat response shape mock
- Path B (Anthropic) 用 content block mock

完整 mock pattern 對照 [`../../stage-3/03-react-from-scratch/test.py`](../../stage-3/03-react-from-scratch/test.py)。**先把 test 跑通、再連真的 LLM**——可以省下 80% 的 debug 時間。

## Step 5 — When to escalate / route away

這個 skill **不**處理：

- **LangChain / LangGraph / CrewAI / Pydantic AI** 等 framework 問題 → 路 Stage 4
- **MCP server / client** 設計 → 路 [`resources/cookbook.md` 2 寫你的第一個 MCP server](../../../resources/cookbook.md)
- **Production 監控 / observability / cost tracking** → 路 Stage 7
- **Prompt engineering 一般技巧** → 路 Stage 2

碰到這些情境、直接告訴使用者「這個 skill 處理 tool-use mechanics、你這個問題需要 Stage X、建議去看 ...」、不要硬吃下去。

## Don't

- **不要直接幫使用者寫一整份 starter.py**——他們需要練 mental model、不是拿到答案 copy-paste。指他們 fork [`../../stage-3/`](../../stage-3/) 的 starter、改 TOOLS_SPEC 就好。
- **不要跳過 Step 1 triage**——4 個 symptom 的修法不同，沒問清楚就猜會浪費時間。
- **不要假設 user 用 Claude**——Path A 默認是 Ollama qwen2.5:3b，先問再答。
- **不要把 schema-design 規則背一遍**——`resources/schema-design-cheatsheet.md` 已經寫好，指過去就行。

## References

- [`references/debug-flowchart.md`](references/debug-flowchart.md) — "為什麼 LLM 不呼叫我的 tool" 4-symptom 診斷
- [`references/schema-evolution.md`](references/schema-evolution.md) — Bad → Good schema worked example（4 個改進步驟）
- [`references/sdk-diff.md`](references/sdk-diff.md) — Anthropic vs OpenAI-compat 並排表
- [`resources/schema-design-cheatsheet.md`](../../../resources/schema-design-cheatsheet.md) — 5 條黃金規則 + 5 個 anti-pattern（curriculum 既有資源）
- [`resources/glossary.md` 2](../../../resources/glossary.md) — Agent / Tool Use / ReAct 名詞定義