---
name: metric-analyst
description: Use when the task involves defining, calculating, or implementing business metrics or KPIs. Triggers include KPI definition, SQL metric logic, Excel formula for a business metric, churn, retention, revenue, conversion, funnel, cohort analysis, ARPU, LTV, CAC, denominator clarification, numerator clarification, join logic for metrics, or "how do we calculate X." Also use when the user suspects a metric is wrong and wants the logic reviewed. Do NOT use for broad EDA, model validation, dashboard design, or final stakeholder memos unless only metric explanation is needed.
---

> Part of the [data-scientist](https://github.com/DAlanMtz/data-scientist) skill suite. Install `data-scientist` for full lifecycle methodology, routing, and review orchestration.

# Metric Analyst

## Purpose

Define business metrics precisely, implement them correctly, and prevent the denominator errors, grain mismatches, and join pitfalls that produce misleading numbers. The posture is precise and skeptical: metric definitions are ambiguous by default, and the job is to make them unambiguous before writing any query.

This skill covers metric logic, not visual presentation. For dashboards, route to `dashboard-designer`. For communicating metric results to stakeholders, route to `insight-reporter`.

## When To Use This Skill

Use `metric-analyst` when:

- The user asks to define a business metric or KPI (churn, retention, conversion, revenue, ARPU, LTV, CAC, funnel, cohort, DAU, MAU, engagement, etc.).
- The user asks for SQL logic, Excel formula, or pseudocode to implement a metric.
- The user suspects a metric is wrong or is getting inconsistent numbers across teams.
- The user needs denominator or numerator clarification for a rate or ratio metric.
- The user asks about join logic, grain, filters, or time windows for a metric calculation.
- The user asks to implement a cohort, retention, or funnel analysis.

## When Not To Use This Skill

| Situation | Use instead |
|---|---|
| Broad EDA, data profiling, schema inspection | `data-explorer` |
| A/B testing, experiment significance | `experiment-analyst` |
| Model validation, leakage | `model-auditor` |
| Dashboard or visual layout | `dashboard-designer` |
| Stakeholder summary or decision memo | `insight-reporter` |
| Production pipeline for recurring metric runs | `production-analytics` |

## Relationship to Parent Skill

| Responsibility | Owner |
|---|---|
| Routing to this skill | Parent `data-scientist` (`workflow/specialist-routing.md`) |
| Metric definition (numerator, denominator, grain, filters, time window) | **This skill** |
| SQL / Excel implementation of metric logic | **This skill** |
| Cohort, retention, funnel, churn, revenue logic | **This skill** |
| Join safety and duplicate risk assessment | **This skill** |
| Visual presentation of metric results | `dashboard-designer` |
| Stakeholder communication of metric results | `insight-reporter` |
| Recurring metric pipeline operationalization | `production-analytics` |

## Entry Gates

Before defining or implementing a metric, confirm or state as assumptions:

1. **Metric name or business question** — What metric is being defined or implemented?
2. **Source tables or fields** — What data is available? (Use stated schema, or note as "schema unknown — define with placeholder names.")
3. **Time window and grain** — Rolling period or calendar period? Per customer, per order, per account?
4. **Filters or population scope** — Which records are in scope? Which are excluded?
5. **Business intent** — What decision will this metric inform? (Influences denominator and scope choices.)

State any missing items as assumptions and proceed. Ask only when the absence would produce a query that is structurally incorrect, not just imprecise.

## Required Workflow

1. **Identify the business question.** What decision does this metric support? This often reveals the correct scope and denominator.
2. **Define the metric in plain language.** "The percentage of customers who [event] out of all customers who [condition] in [time window]." Write this before writing any SQL.
3. **Define numerator, denominator, grain, filters, and time window.**
   - Numerator: what events/records are counted?
   - Denominator: what is the base population? Who is included and excluded?
   - Grain: one row = one what in the result?
   - Filters: what cohort, segment, status, or date filter applies?
   - Time window: calendar month, rolling 30 days, fiscal quarter? How are boundaries handled?
4. **Check join logic and duplicate risk.** If tables are joined, confirm the join key is correct, the cardinality is expected, and a fan-out will not inflate numerator or denominator counts.
5. **Provide SQL, Excel, or pseudocode as requested.** Include comments on the logic — not the code mechanics. Flag edge cases inline.
6. **Validate edge cases.** NULL handling, divide-by-zero, first-period cohorts with incomplete data, re-activation vs. new customers, refunds vs. gross revenue.
7. **State interpretation guidance.** What does a change in this metric mean? What does it not mean? Prevent overclaiming in how the number is used.
8. **Handoff.** If the metric feeds a dashboard, route to `dashboard-designer`. If it feeds stakeholder communication, route to `insight-reporter`. If it needs to run on a schedule, route to `production-analytics`.

## Output Formats

| Format | Use when |
|---|---|
| **Metric definition** | Plain-language specification of numerator, denominator, grain, filters, time window |
| **KPI calculation logic** | Step-by-step implementation logic before code |
| **SQL query** | Runnable SQL implementing the metric with edge cases handled |
| **Excel / pivot guidance** | Formula, pivot table structure, or step-by-step Excel logic |
| **Cohort / retention logic** | Day-0 cohort definition, follow-up window, activity definition |

## Standard Metric Definition Format

```
**Metric: [Metric Name]**
Business question: [What decision does this inform?]

**Definition:**
[Plain-language sentence: "The [rate/count/sum] of [numerator] out of [denominator] for [population] in [time window]."]

**Numerator:** [What events or records are counted / summed / averaged]
**Denominator:** [Base population — include/exclude criteria]
**Grain:** One row in the result = one [unit]
**Filters:** [Cohort, segment, status, or date filter]
**Time window:** [Calendar / rolling — boundary handling]

**Edge cases:**
- [Edge case 1 — e.g., NULL handling]
- [Edge case 2 — e.g., first-period incomplete cohort]

**SQL:**
[Query]

**Interpretation note:** [What a change in this metric means and does not mean]
```

## Common Metric Patterns

### Retention (N-day or Month-over-Month)

- **Cohort definition:** Users who first performed [qualifying event] in [period].
- **Retention window:** Did they return within [N days / next period]?
- **Grain:** One row per cohort × period.
- **Denominator:** Cohort size at day 0 (not current active users).
- **Pitfall:** Denominator shrinkage if using current-period population instead of original cohort.

### Churn Rate

- **Definition:** Customers who [churned event] in the period / customers at risk at period start.
- **Denominator:** Active customers at the start of the period — not end-of-period count.
- **Pitfall:** Including reactivations in the denominator inflates the base.

### Conversion Rate

- **Numerator:** Sessions / users who completed [target event].
- **Denominator:** Sessions / users who entered the funnel at the defined step.
- **Pitfall:** Denominator definition determines whether conversion is "of all visitors" vs. "of qualified entrants."

### Revenue Metrics (MRR, ARR, ARPU, LTV)

- **MRR:** Sum of monthly-equivalent recurring revenue — exclude one-time charges.
- **ARPU:** Total revenue / active users in period — clarify "active" definition.
- **LTV:** Requires churn rate and gross margin assumptions — flag as a model, not a factual number.
- **Pitfall:** Including refunds, discounts, or non-recurring charges in recurring revenue.

## Review Checklist

Run before delivering any metric definition or query:

| # | Check | Pass condition |
|---|---|---|
| MA1 | Numerator matches the intended events | The counted records match the business intent |
| MA2 | Denominator is correct and explicit | The base population is stated; includes/excludes are intentional |
| MA3 | Grain is clear | One row in the result = one [unit] is explicit |
| MA4 | Filters are explicit | Every applied filter is named and justified |
| MA5 | Time window is unambiguous | Calendar vs. rolling, boundary handling is specified |
| MA6 | Join logic is safe | No fan-out inflation; join cardinality is confirmed or flagged |
| MA7 | Edge cases are handled | NULLs, divide-by-zero, incomplete periods are addressed |
| MA8 | Interpretation is appropriate | Result does not overclaim; "more" or "less" is in context |

**Common failure modes:**
- Using end-of-period population as denominator for a retention or churn metric
- Fan-out from a one-to-many join inflating a SUM
- Calendar-month rolling window confusion at period boundaries
- Treating LTV as a factual number rather than a model with assumptions
- Omitting first-period cohort incomplete-data caveat

## Handoff Back to `data-scientist`

After metric definition and implementation:

- If results need visual presentation, route to `dashboard-designer` with the metric definitions and grain.
- If results need stakeholder communication, route to `insight-reporter` with findings.
- If the metric will run on a schedule, route to `production-analytics`.
- Return to parent `data-scientist` for full-lifecycle analysis or modeling that uses this metric as a target or feature.
