---
name: session-metrics
model: sonnet
description: >
  Tally Claude Code session token usage and cost estimates from the raw JSONL
  conversation log. Trigger when the user asks about session cost, token usage,
  API spend, cache hit rate, input/output tokens, or wants a breakdown of how
  much a Claude Code session has cost. Also trigger for "how much have we spent",
  "show me token usage", "session summary", "cost so far", or any request to
  analyse or display per-turn metrics from the current or a past session.

  Do NOT auto-dispatch compare mode (--compare / --compare-prep / --compare-run
  / --count-tokens-only) from natural-language phrases. The skill body uses
  $ARGUMENTS[0] as the dispatch key — if the first positional argument is not
  literally "compare", "compare-prep", "compare-run", or "count-tokens", route
  to the default single-session report.
---

# Session Metrics

Runs `scripts/session-metrics.py` against the Claude Code JSONL log to produce
a timeline-ordered cost summary with per-turn and cumulative totals.

## Dispatch — how to route this invocation

**First positional argument received:** `$ARGUMENTS[0]`
**Full argument string:** `$ARGUMENTS`

Read `$ARGUMENTS[0]` above and match it by **literal equality** against the
table below. Claude Code already tokenized the arguments shell-style, so no
parsing is required — just compare strings.

| `$ARGUMENTS[0]`     | Route                                     | Then read |
|---------------------|-------------------------------------------|-----------|
| `all-projects`      | Instance-wide dashboard aggregating every project under `~/.claude/projects` | `## Instance dashboard (all projects)` below |
| `compare`           | Two-session compare on JSONLs that already exist | `## Model comparison` below, then [`references/model-compare.md`](references/model-compare.md) before running |
| `compare-run`       | Fully automated capture: spawns two `claude -p` sessions, feeds the suite, then runs `--compare` | `## Model comparison` below, then [`references/model-compare.md`](references/model-compare.md) "Workflow A — automated" |
| `compare-prep`      | Print manual capture protocol + 10-prompt suite (fallback when headless is unavailable) | `## Model comparison` below |
| `count-tokens`      | API-key-only tokenizer check              | `## Model comparison` below |
| *(empty, or any other value)* | Default single-session report   | `## Quick usage` below |

This is the single gate that keeps compare mode off the natural-language
path. **Do not infer the route from the user's chat history; only use the
literal value of `$ARGUMENTS[0]` above.**

When the skill auto-triggers from a natural-language question ("how much did
this session cost?", "show me token usage"), there are no positional
arguments — `$ARGUMENTS[0]` is empty — and you always route to the default.
Phrases like "compare 4.6 vs 4.7 cost" arriving as natural language do NOT
produce `$ARGUMENTS[0] = compare` and must not route into compare mode;
answer them by running the default report on the current session and
suggesting `/session-metrics compare-prep` if the user wants a real
benchmark.

## Pre-flight context

- skill-dir: ${CLAUDE_SKILL_DIR}
- session-id: ${CLAUDE_SESSION_ID}

## Quick usage

```bash
# Current session (pinned to session ID — no heuristic)
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --session ${CLAUDE_SESSION_ID}

# Specific session ID
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --session <uuid>

# Specific project slug (use = when slug starts with "-")
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --slug=-home-user-projects-myapp
# Or via env var (always safe):
CLAUDE_PROJECT_SLUG="-home-user-projects-myapp" uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py

# List available sessions for this project
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --list

# All sessions — timeline + per-session subtotals + grand project total
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --project-cost

# Export to exports/session-metrics/ (one or more formats)
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --output json
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --output json csv md html
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --project-cost --output html
```

> `${CLAUDE_SKILL_DIR}` is expanded by Claude Code to the skill's install directory (plugin cache, project-local copy, or bundled template — whichever applies). When running the script manually from a shell, substitute the actual path.

## Export formats

`--output` accepts one or more of: `json` `csv` `md` `html`

Text is always printed to stdout. Exports go to `exports/session-metrics/` in the
project root, named `session_<id8>_<YYYYMMDD_HHMMSS>.<ext>` (single) or
`project_<YYYYMMDD_HHMMSS>.<ext>` (project mode).

| Format | Contents |
|--------|----------|
| `json` | Full structured report with all turns, subtotals, model rates |
| `csv`  | One row per turn: session_id, index, timestamp, model, tokens, cost |
| `md`   | Summary table + per-session Markdown tables |
| `html` | Dark-theme report with summary cards + insights + chart. 2-page split by default (`<stem>_dashboard.html` + `<stem>_detail.html`); pass `--single-page` for one file. |

### HTML-specific flags

| Flag | Purpose |
|------|---------|
| `--single-page`              | Emit one self-contained HTML instead of the dashboard+detail split. |
| `--chart-lib {highcharts,uplot,chartjs,none}` | Choose the chart renderer. Default `highcharts` (richest visualization, vendored, SHA-256-verified, non-commercial license). `uplot` and `chartjs` are MIT-licensed alternatives. `none` emits a detail page with no JS dependency. See [`scripts/vendor/charts/README.md`](scripts/vendor/charts/README.md) for per-library license terms. |
| `--peak-hours H-H`           | Translucent band on the hour-of-day chart (e.g. `5-11`). Community-reported, not an Anthropic SLA. |
| `--peak-tz <IANA>`           | Timezone the peak hours are defined in (default `America/Los_Angeles`). |

### Other useful flags

| Flag | Purpose |
|------|---------|
| `--tz <IANA>`                | IANA timezone for time-of-day bucketing **and timeline/export timestamps**. Defaults to the system local tz (auto-detected via `TZ` env var or the OS setting). |
| `--utc-offset <H>`           | Fixed UTC offset, DST-naive. Use `--tz` for DST-aware. |
| `--no-cache`                 | Skip `~/.cache/session-metrics/parse/` and always re-parse from scratch. |
| `--no-include-subagents`     | Skip spawned subagent JSONL files. Subagents are included by default; use this for faster runs when subagent detail is not needed. |
| `--cache-break-threshold <N>` | Turns whose `input + cache_creation` exceed N are flagged as **cache-break events** (default 100 000). Matches Anthropic's `session-report` convention. |
| `--no-subagent-attribution`  | Disable Phase-B subagent → parent-prompt token attribution. Default behaviour rolls every subagent's tokens up onto the user prompt that spawned the chain (additional `attributed_subagent_*` fields, no double-counting). |
| `--sort-prompts-by {total,self}` | How to rank top prompts in HTML/MD output. `total` (default) = parent + attributed subagent cost, surfaces cheap-prompt-spawning-expensive-subagent turns. `self` = parent only (pre-Phase-B order). CSV/JSON keep `self` ordering for stability regardless of this flag. |

> **Invocation note for the AI.** Don't pass `--tz` or `--utc-offset` unless the user explicitly asks for a specific timezone. The script auto-detects the user's system tz and renders all human-facing timestamps (timeline, session headers, generated-at banner, block anchors) in that tz. JSON/CSV raw `timestamp` fields stay UTC ISO-8601 as a machine-readable audit trail. Don't pass `--include-subagents` — subagents are included by default. Only pass `--no-include-subagents` if the user explicitly asks for a faster/leaner run without subagent detail.

## Output columns

| Column    | Meaning                                      |
|-----------|----------------------------------------------|
| `#`       | Deduplicated turn index                      |
| `Time`    | Timestamp of the turn in the user's local timezone (auto-detected; override with `--tz` / `--utc-offset`). Header shows the active tz label. Raw `timestamp` fields in JSON/CSV exports remain UTC ISO-8601 (`...Z`) for machine-readability. |
| `Input`   | Net new input tokens (uncached portion only — cache reads/writes are shown separately) |
| `Output`  | Output tokens generated (includes thinking + tool_use block tokens) |
| `CacheRd` | Tokens served from prompt cache (cheap)      |
| `CacheWr` | Tokens written to prompt cache (one-time). `1h` / `mix` badge marks turns that used the 1-hour TTL tier; CSV/JSON expose `cache_write_5m_tokens` and `cache_write_1h_tokens` as dedicated columns. |
| `Content` | Per-turn content-block distribution. Letter encoding `T` thinking, `u` tool_use, `x` text, `r` tool_result, `i` image (zero counts omitted). Renders only when at least one turn carries any content block. CSV/JSON expose `thinking_blocks` / `tool_use_blocks` / `text_blocks` / `tool_result_blocks` / `image_blocks` as dedicated per-turn columns. |
| `Total`   | Sum of the four billable token buckets       |
| `Cost $`  | Estimated USD for this turn                  |

Deep-dive on exact column semantics, JSON keys, and detection rules:
[`references/jsonl-schema.md`](references/jsonl-schema.md).

### Per-turn drill-down (HTML only)

In the Detail page and single-page variants, every Timeline row is
clickable (keyboard: Enter / Space) and opens a right-side **drawer**
showing the user's actual prompt, any slash command, the tools that were
called (with one-line input previews), the content-block mix, the
per-turn cost/token breakdown, and the assistant's reply. Prompt and
assistant text are truncated to ~240 characters with a **Show full
prompt** / **Show full response** toggle that reveals the full text (up
to a 2 KB cap on assistant text). Close with the × button, Esc, or a
backdrop click — focus returns to the originating row. Below the
Timeline, a collapsible **Prompts** section lists the top-20
most-expensive prompts; each row opens the same drawer. The Dashboard
variant has no Timeline, so it's unaffected.

Footer shows session totals + **cache savings** vs a hypothetical no-cache
run. Conditional dashboard cards appear when their feature was used in the
session: **Cache TTL mix** (when any 1h-tier cache writes happened),
**Extended thinking engagement** (when any turn carried a `thinking` block),
**Tool calls** (top-3 tool names), **Session resumes** (timeline divider at
each `claude -c` resume point, detected from `/exit` + synthetic-turn
fingerprint — lower-bound count), and the **Usage Insights** panel
(prose-style pattern characterisations inspired by Anthropic's `/usage`
command, auto-hide below threshold, exposed in JSON under `usage_insights`
and in Markdown under `## Usage Insights`).

### Cross-cutting sections (v1.6.0 — inspired by Anthropic's session-report)

Three additional sections auto-hide when empty and render at every scope
(single session / `--project-cost` / `--all-projects`). All three feed the
same `by_skill`, `by_subagent_type`, and `cache_breaks` keys in the JSON
export.

- **Cache breaks** — single turns whose `input + cache_creation` exceeds
  the threshold (configurable via `--cache-break-threshold`, default
  100 000). Each row names **which turn** lost the cache; the HTML version
  is expandable and shows ±2 user-prompt context around the event.
  Complements the overall cache-hit % with actionable "here is where it
  blew up" detail.
- **Skills & slash commands** — one row per named skill or `/slash`
  command, columns: invocations, turns attributed, input / output / cache
  tokens, % cached, cost, % of total. Attribution model: a slash-prefixed
  user prompt sets the "current skill" for that prompt and its follow-up
  turns; a Skill-tool invocation overrides attribution for its own turn.
  Slash-prefixed keys (e.g. `/session-metrics`) are de-slashed so they
  merge with Skill-tool invocations of the same name.
- **Subagent types** — one row per resolved `subagent_type` (from
  `Agent` / `Task` tool_use `input.subagent_type`). Shows spawn count
  always; token/cost columns populate from subagent JSONLs (default behaviour —
  pass `--no-include-subagents` to skip).

**UUID-based dedup** runs at project and instance scope to prevent
resumed-session replays from double-counting. Session scope keeps the
existing `message.id` streaming-split dedup.

### Subagent → parent-prompt attribution (v1.7.0 — Phase B)

Subagent token usage is included by default and rolls up onto the **user prompt** that originally triggered the chain:

- Three new turn-record fields populate on the spawning prompt's row —
  `attributed_subagent_tokens`, `attributed_subagent_cost`,
  `attributed_subagent_count`. They are **purely additive**:
  `cost_usd` / `total_tokens` on every turn (parent and subagent) are
  unchanged, so existing aggregators and the session total are
  untouched. Display layers read both columns separately.
- **Nested chains** (subagent A → subagent B) attribute B's tokens onto
  the **root** user prompt, not onto A. Implemented via an iterative
  resolve over `(tool_use.id, agentId)` linkage extracted from the
  parent's `toolUseResult.agentId` fields, with a cycle guard.
- **HTML prompts table** sorts by `cost_usd + attributed_subagent_cost`
  by default (configurable via `--sort-prompts-by`). A new "Subagents
  +$" column is auto-shown when at least one top-20 row has attributed
  cost. The prompt snippet gains a "+N subagents" badge.
- **CSV per-turn export** always includes the three attribution columns
  (zero on rows without spawn activity) — column count stays stable.
- **JSON report** exposes top-level `subagent_attribution_summary`
  with `attributed_turns`, `orphan_subagent_turns`,
  `nested_levels_seen`, `cycles_detected`. Useful for sanity checks
  when pointing at unfamiliar history.

Disable subagent loading entirely with `--no-include-subagents` (fastest, no subagent token detail).
Disable only the attribution rollup while still loading subagents with `--no-subagent-attribution` (compare against pre-Phase-B reports).

## Instance dashboard (all projects)

Reached when `$ARGUMENTS[0]` is `all-projects`, or when the user asks
for the **total cost across every project** ("how much have I spent on
Claude Code overall?", "what's my total spend across all projects?",
"which project is costing me the most?").

Aggregates every project under `~/.claude/projects/` (or
`CLAUDE_PROJECTS_DIR`, or the `--projects-dir` override) into a single
dashboard with instance-wide totals, a daily cost timeline, and a
per-project breakdown table sorted by cost descending. Each project row
hyperlinks to a pre-rendered per-project HTML drilldown that carries
the full session/turn detail (same report as `--project-cost <slug>`).

```bash
# Instance-wide dashboard — HTML + MD + CSV + JSON
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --all-projects --output html md csv json

# Fast path — no per-project drilldown HTMLs (rows render as plain text)
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --all-projects --no-project-drilldown --output html

# Multi-instance: point at a non-default Claude Code install
uv run python ${CLAUDE_SKILL_DIR}/scripts/session-metrics.py --all-projects --projects-dir /opt/claude-work/projects
```

### Output layout

Exports write to a dated subfolder so successive runs don't overwrite
each other and the whole bundle stays portable (zip it, move it, serve
it as static files — relative drilldown links keep working):

```
exports/session-metrics/instance/YYYY-MM-DD-HHMMSS/
  index.html    # entry point — instance dashboard
  index.md
  index.csv     # one row per session, with a project_slug column
  index.json    # full instance report (no per-turn records — only per-session summaries)
  projects/
    <slug-1>.html   # full per-project HTML, same as --project-cost <slug-1>
    <slug-2>.html
    ...
```

`--no-project-drilldown` skips the `projects/` folder entirely and
renders `index.html` with project rows as plain text (no hyperlinks) —
useful for CI or quick-glance runs. Per-turn data is always suppressed
at the instance scope; users drill down by clicking into a project HTML.

### Multi-instance Claude Code setups

Three layered overrides pick the projects directory (highest precedence first):

1. `--projects-dir <path>` CLI flag
2. `CLAUDE_PROJECTS_DIR` environment variable
3. Default `~/.claude/projects`

The resolved projects directory is rendered into the HTML header byline
so output from multiple instances is self-documenting when viewed
side-by-side.

## Model comparison

Reached only when `$ARGUMENTS[0]` is `compare`, `compare-run`,
`compare-prep`, or `count-tokens` (see the Dispatch section at the top
of this file).

**Default pair is `claude-opus-4-6[1m]` vs `claude-opus-4-7[1m]`** —
the 1M-context tier, because that matches Claude Code's shipping Opus
routing. Users opt into the 200k-context variants by passing the
unsuffixed IDs explicitly. Mixed-tier pairs are accepted and surface
the existing `context-tier-mismatch` advisory on the report.

| Mode | Use when |
|------|----------|
| `compare-run` *(preferred)* | User wants the report with no manual capture. Orchestrator spawns two `claude -p` sessions via subscription auth, feeds the canonical suite, then runs `--compare`. Zero API key. |
| `compare`                   | Two session JSONLs already exist (either from a prior `compare-run`, or from manual `/model` + paste). Input is a path / UUID / `last-<family>` / `all-<family>` token. |
| `compare-prep`              | Print the manual capture protocol. Only suggest when `claude -p` is unavailable (e.g. CI container without the CLI). |
| `count-tokens`              | API-key tokenizer smoke test. NOT a subscription path; do not suggest to users comparing two subscription sessions. |

**Output formats for compare mode.** Compare mode supports `--output text md json csv html`. The HTML report is always single-page (a compact scored per-prompt table); `--single-page` and `--chart-lib` are ignored when `--compare` is active.

**`compare-run` auto-extras.** When `compare-run` is invoked with
`--output <fmt>`, it emits the compare report *and* five companion
files in `exports/session-metrics/`: a per-session dashboard + detail
(HTML) and raw JSON for each side, plus a Markdown analysis scaffold
(`compare_<a8>_vs_<b8>_<ts>_analysis.md`) with headline ratios,
per-prompt table, cost decomposition, and a bolded decision-framework
verdict. Prose sections carry `{{TODO}}` placeholders so a follow-up
chat can fill them in. Pass `--no-compare-run-extras` to skip the
companions and emit only the compare report. Without `--output`,
nothing writes to disk (text-only stdout path preserved).

**Custom prompts.** Users can add their own prompts to the comparison suite with
`--compare-add-prompt "text"` — no file format knowledge required. The prompt is
saved to `~/.session-metrics/prompts/` and runs automatically on every subsequent
`--compare-run`. Use `--compare-list-prompts` to preview the active suite and call
count before spending on inference. Use `--compare-remove-prompt <name>` to remove.
Full guide: [`references/custom-prompts.md`](references/custom-prompts.md).

**Prompt steering.** `--compare-run-prompt-steering <variant>` wraps every
prompt in the suite with a steering instruction before feeding it to
`claude -p`. The four built-in variants are `concise`,
`think-step-by-step`, `ultrathink`, and `no-tools`. The wrapper is
applied symmetrically to both sides so the A/B stays clean — what shifts
is the model's behaviour under the same instruction, surfaced as token /
cost / thinking / tool-call deltas. Use
`--compare-run-prompt-steering-position {prefix,append,both}` (default
`prefix`) to control where the steering text lands relative to the body.
IFEval pass rates may differ from the unsteered baseline by design —
predicate breakage (e.g. "be concise" violating the 50-word stacked
constraint) is the *measurement*, not a regression. For multi-variant
sweeps with auto-rendered comparison articles + cross-variant summary,
use the `benchmark-effort-prompt` skill instead.

**Before proposing any compare-mode command, read
[`references/model-compare.md`](references/model-compare.md).** That
doc has the full flag table, four workflow recipes, 4-way Opus combo
matrix, IFEval predicates, advisory semantics, and troubleshooting.
The eager content in this file deliberately stays minimal so
single-session reports don't pay for compare-mode context they don't
use.

## Reference files

- [`references/pricing.md`](references/pricing.md) — Per-model token prices used
  for cost calculation. Read when the user asks about pricing or you need to
  add a new model.
- [`references/jsonl-schema.md`](references/jsonl-schema.md) — JSONL entry
  structure + full output-column semantics, cache-TTL split rationale, content-
  block distribution, resume detection. Read when debugging missing data,
  extending the script, or interpreting any non-obvious column/key.
- [`references/model-compare.md`](references/model-compare.md) — `--compare`
  workflow, prompt-suite catalogue, IFEval predicates, interpretation guide.
  Read when `$ARGUMENTS[0]` routes into compare mode.
- [`references/custom-prompts.md`](references/custom-prompts.md) — Step-by-step
  guide for adding, removing, and previewing custom prompts for `--compare-run`.
  Read when the user asks how to add their own prompts or customise the suite.
- [`references/platform-notes.md`](references/platform-notes.md) — Windows
  `tzdata` caveat for IANA `--tz` names, the `--strict-tz` escape hatch,
  and timezone-contract summary. Read when the user reports a timezone
  warning, asks about Windows support, or wants CI-safe tz handling.

## How session detection works

1. Derives the project slug from `cwd`: replaces `/` → `-`, strips leading `-`.
2. Scans `~/.claude/projects/<slug>/` for `*.jsonl` files (excludes `subagents/`).
3. Picks the most recently modified file as the current session.
4. Override with `--session <uuid>` or `--slug <slug>` when needed.

## Deduplication

Each API response is written to the JSONL multiple times (streaming, tool
completion, final). The script deduplicates on `message.id` — keeping only the
**last** occurrence so token counts reflect the final settled value.
