---
name: claude-tool-profiling
description: Measure how long Claude Code tool calls take using the built-in OpenTelemetry console exporter. Use when asked to profile a tool, benchmark a tool, measure tool duration, time a tool call, compare tool overhead, or investigate why a tool feels slow.
---

# Claude Tool Profiling

> **Keep the gist in sync.** This skill directory is a git clone of a public gist:
> https://gist.github.com/gurdiga/3ce759edce07d2ae60baa12d4b21edae
>
> After editing, commit and push:
>
> ```bash
> cd ~/.claude/skills/claude-tool-profiling
> git add -A && git commit -m "Update" && git push
> ```

Time individual tool calls (Bash, Edit, MCP, etc.) via Claude Code's built-in OTel instrumentation. No Docker, no collector — console exporter dumps spans to stdout, then a small awk extractor pulls out `tool.execution` durations.

## When to reach for this

- User says "measure how long X takes" where X is a Claude tool
- Comparing overhead between two tools (e.g. `Bash` vs an MCP tool)
- Checking if a tool call's cost justifies a workflow choice (e.g. batching vs. many small calls)

Not for: profiling user scripts, profiling LLM latency (it's captured but isn't the focus), production observability (use a real OTel collector instead).

## Quick start

Run the prompt 5 times, extract durations, print stats:

```bash
bash ~/.claude/skills/claude-tool-profiling/run-experiment.sh \
  5 \
  "Run echo hello via Bash, then say done" \
  Bash
```

Args: `<runs> <prompt> <tool_name_substring>`.

Output: per-run `tool.execution` ms, then min/max/avg across runs.

## How it works

Claude Code emits OTel spans when these env vars are set:

```bash
CLAUDE_CODE_ENABLE_TELEMETRY=1
CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1   # gates per-tool spans
OTEL_TRACES_EXPORTER=console            # JSON spans → stdout
OTEL_LOG_TOOL_DETAILS=1                 # include tool_name, full_command, file_path
```

Each tool call produces a three-span tree:

```
claude_code.tool               tool_name=X   duration_ms=T          ← total
├── claude_code.tool.blocked_on_user         duration_ms=P          ← permission wait
└── claude_code.tool.execution               duration_ms=E, success  ← actual run
```

`tool.execution` is the number you want — permission wait and parent overhead are captured separately.

## Running `claude -p` non-interactively

**Pass the prompt on stdin, not as a positional arg** — in some sandboxed shells the positional arg is dropped:

```bash
echo "<your prompt>" | \
  CLAUDE_CODE_ENABLE_TELEMETRY=1 \
  CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1 \
  OTEL_TRACES_EXPORTER=console \
  OTEL_LOG_TOOL_DETAILS=1 \
  claude -p --permission-mode bypassPermissions \
    > run.log 2>&1
```

Gotchas:

- `--permission-mode bypassPermissions` removes the permission prompt path (tests the fast path)
- Spans land in **stdout**, interleaved with the model's final text answer
- MCP tools configured globally in `~/.claude.json` are inherited by `claude -p`
- Deferred tools (those needing `ToolSearch`) cost one extra LLM round-trip per fresh session — factor that into first-call numbers

## Extracting durations

Extract all tool spans from a log:

```bash
awk '
  /^  name: "claude_code\.tool"/ { in_tool=1; tool=""; dur="" }
  in_tool && /^    tool_name:/ { sub(/^    tool_name: /,""); sub(/,$/,""); gsub(/"/,""); tool=$0 }
  in_tool && /^    duration_ms:/ { gsub(/,/,"",$2); dur=$2 }
  in_tool && /^}$/ {
    printf "  tool=%-48s  duration_ms=%s\n", tool, dur
    in_tool=0
  }
' run.log
```

For just `tool.execution` durations:

```bash
awk '/name: "claude_code\.tool\.execution"/,/^}$/' run.log \
  | awk '/duration_ms:/ {gsub(/,/,"",$2); print $2}'
```

## Calibration numbers

Baseline measurements on Claude Code 2.1.123, macOS arm64 (April 2026):

| Tool | avg `tool.execution` | n |
|---|---|---|
| `Bash "echo hello"` | 3,410 ms | 5 |
| `mcp__chrome-devtools__take_screenshot` (viewport PNG) | 190 ms | 5 |

Bash has a ~3s floor per call from subprocess + hook + IPC overhead. MCP stdio tools don't pay this cost. Rule of thumb: for many tiny shell ops, chain with `&&` in a single Bash call rather than making many calls.

## Interpreting results

A full run decomposes into:

```
claude_code.interaction              ← wall clock
├── claude_code.llm_request × N      ← model thinking (usually dominates)
└── claude_code.tool × M             ← each tool call
    ├── .blocked_on_user              ← permission wait (~ms)
    └── .execution                    ← the number that matters
```

If `.execution` is fast but wall time is slow, the overhead is in the LLM, not the tool. Don't blame the tool for round-trip latency.

## See also

- [`run-experiment.sh`](run-experiment.sh) — the runner + extractor

Docs: [code.claude.com/docs/en/monitoring-usage.md](https://code.claude.com/docs/en/monitoring-usage.md)
