---
name: production-analytics
description: Use when the task is to operationalize analytics for repeatable, scheduled, or monitored production use. Triggers include production pipeline, scheduled analytics, monitoring plan, data contract, refresh cadence, drift, alerts, backfill, retry, SLA, ownership, operational dashboard, recurring report, or "make this run automatically." Also use when a dashboard or analysis needs to be handed off to another team for ongoing operation. Do NOT use for one-off analysis, exploratory work, simple metric definitions, or stakeholder summaries without operational requirements.
---

> Part of the [data-scientist](https://github.com/DAlanMtz/data-scientist) skill suite. Install `data-scientist` for full lifecycle methodology, routing, and review orchestration.

# Production Analytics

## Purpose

Turn validated analytics — a dashboard, a metric, a report, a model scoring pipeline — into a production-ready, owned, and monitored operational asset. The posture is defensive: assume failures will happen, data will drift, and the person who built the original analysis will not always be available. Define contracts, failure behavior, and ownership before anything goes live.

This skill covers operationalization, not analysis. The analysis must be validated before this skill applies. Use `data-explorer`, `metric-analyst`, `experiment-analyst`, or `model-auditor` first when the analytical work is still in progress.

## When To Use This Skill

Use `production-analytics` when:

- The user asks to schedule, automate, or productionize an analysis, report, or pipeline.
- The user asks about monitoring, drift detection, or alerting for a running analytics system.
- The user needs a data contract for a pipeline's inputs or outputs.
- The user asks about refresh cadence, freshness SLAs, or scheduled job behavior.
- The user is handing off a dashboard or recurring report to another team for ongoing operation.
- The user asks about backfill behavior, retry logic, or failure handling for a data job.
- The user asks who owns a pipeline and what happens when it fails.

## When Not To Use This Skill

| Situation | Use instead |
|---|---|
| One-off analysis or exploratory work | Parent `data-scientist` |
| Simple metric definition | `metric-analyst` |
| Dashboard layout and design | `dashboard-designer` |
| Stakeholder summary or report | `insight-reporter` |
| Model validation or leakage check | `model-auditor` |
| Experiment analysis | `experiment-analyst` |
| First-pass EDA | `data-explorer` |

## Relationship to Parent Skill

| Responsibility | Owner |
|---|---|
| Routing to this skill | Parent `data-scientist` (`workflow/specialist-routing.md`) |
| Validating analysis before productionization | Appropriate specialist (`metric-analyst`, `model-auditor`, etc.) |
| Defining production inputs, outputs, and contracts | **This skill** |
| Specifying refresh cadence, failure behavior, alerts | **This skill** |
| Monitoring and drift detection planning | **This skill** |
| Backfill and reprocessing design | **This skill** |
| Ownership and escalation definition | **This skill** |
| Production-impacting actions (requiring approval) | Level 4 gate — explicit user approval required |

## Entry Gates

Before beginning production planning, confirm or state as assumptions:

1. **Operational objective** — What does this pipeline or report produce, and who uses it?
2. **Expected consumers** — Who or what reads the output? (Dashboard, downstream table, API, email, team.)
3. **Input sources** — What data does the pipeline read? What are the upstream dependencies?
4. **Output definition** — What does the pipeline produce? Table, file, dashboard refresh, API response?
5. **Refresh cadence or trigger** — Daily batch, real-time, event-triggered, manual?

If two or more are missing and would materially change the design, apply Level 2 (Clarify Then Proceed). Apply Level 4 (Approval Required) before recommending or specifying any action that writes to production systems, drops tables, modifies schedules, or changes live behavior.

## Required Workflow

1. **Define the production objective and consumers.** What is this pipeline's job? Who depends on its output? What breaks if it fails?
2. **Define inputs, outputs, owners, and dependencies.** Name every upstream source. State the owner of each. Name the downstream consumers. Define what the pipeline produces.
3. **Define the refresh cadence and freshness expectations.** How often does it run? What is the maximum acceptable lag? What freshness does the consumer expect?
4. **Define failure behavior, retries, and alerts.** What happens when the pipeline fails? Who is notified? How many retries before alerting? Is there a fallback output?
5. **Define monitoring, drift, and data quality checks.** What row counts, metric bounds, and data quality checks run on every execution? What triggers an alert vs. a warning vs. a pass?
6. **Define backfill and reprocessing behavior.** How are missed runs handled? Is backfilling idempotent? Are there gaps or overlaps in backfilled data?
7. **Produce the operational handoff summary** (see Output Formats below).
8. **Run the approval gate for production-impacting actions.** Any action that writes to production, changes a schedule, or modifies live behavior requires explicit user confirmation (Level 4) before proceeding.

## Output Formats

| Format | Use when |
|---|---|
| **Production analytics checklist** | Structured readiness review of a pipeline before go-live |
| **Monitoring plan** | Specific checks, thresholds, alert routing, and cadence |
| **Data contract outline** | Formal or informal spec of inputs, outputs, schema, freshness, and SLAs |
| **Operational handoff** | End-of-project document for the team taking over ongoing operation |
| **Alert and failure behavior plan** | Decision tree for failure scenarios, fallbacks, and escalation paths |

## Standard Operational Handoff Format

```
**Operational Handoff: [Pipeline / Report Name]**
Owner: [primary owner name and contact]
Escalation: [escalation path if owner is unavailable]
Status: [Live / Shadow period / Staging]

**What it does:** [1–2 sentence description of the pipeline's job]
**Consumers:** [who/what uses the output]

**Inputs:**
| Source | Table / path | Owner | Freshness lag | SLA |
|---|---|---|---|---|
| [source] | [location] | [owner] | [lag] | [SLA] |

**Output:**
| Artifact | Location | Format | Grain | Freshness |
|---|---|---|---|---|
| [output] | [location] | [format] | [grain] | [freshness target] |

**Refresh cadence:** [schedule or trigger]
**Idempotent:** [Yes / No — describe behavior on re-run]

**Failure behavior:**
- Retry policy: [N retries, backoff]
- Alert: [who gets notified, how]
- Fallback: [stale output / no output / hard fail]
- Escalation threshold: [X consecutive failures → escalate to Y]

**Monitoring checks:**
| Check | Threshold | Action |
|---|---|---|
| Row count | [min/max] | [alert / warn / pass] |
| Null rate on [key field] | [< X%] | [alert / warn] |
| [Metric bound] | [range] | [alert] |

**Drift detection:** [how feature / metric drift is monitored]

**Backfill behavior:** [how missed runs are handled; idempotency status]

**Known limitations and risks:**
- [Risk 1]
- [Risk 2]

**Open items before full production:**
- [ ] [Item 1]
- [ ] [Item 2]
```

## Review Checklist

Run before declaring any pipeline production-ready:

| # | Check | Pass condition |
|---|---|---|
| PA1 | Inputs and outputs are defined | Every input source and output artifact is named and located |
| PA2 | Owners are named | A primary owner and escalation path exist for every input and output |
| PA3 | Refresh cadence is specified | Schedule or trigger is defined; expected freshness is stated |
| PA4 | Failure behavior is defined | Retry policy, alert routing, and fallback behavior are specified |
| PA5 | Monitoring checks are in place | Row count, data quality, and metric bound checks run on every execution |
| PA6 | Backfill/retry behavior is documented | Behavior for missed or failed runs is defined; idempotency status is stated |
| PA7 | Approval obtained for production-impacting actions | Any write-to-production action has explicit user confirmation before proceeding |

**Common failure modes:**
- Declaring a pipeline production-ready because the notebook runs successfully
- No alert routing — failures go undetected until a consumer notices
- Backfill creates duplicate rows because the load is not idempotent
- No owner named — the pipeline orphans when the original analyst moves on
- Monitoring checks exist but thresholds are never set, so alerts never fire

## Production-Impacting Action Gate

**Before recommending or specifying any action that:**
- Writes to a production database, table, or object store
- Changes a live schedule or cron job
- Drops, truncates, or overwrites production data
- Publishes to an external endpoint or sends automated notifications

**State the action, its effect, and whether it is reversible. Wait for explicit user confirmation before proceeding.**

This gate applies even when the user has said "go ahead" on the overall task — the gate applies to each specific production-impacting action individually.

## Handoff Back to `data-scientist`

After production planning:

- Return to parent `data-scientist` for project closeout, using the Production Handoff closeout variant from `workflow/control-tree.md`.
- If monitoring surfaces anomalies, route to `data-explorer` for data quality investigation, `metric-analyst` for metric logic review, or `model-auditor` for model performance concerns.
- Include the Operational Handoff document as the artifact for the closeout.
