---
name: customer-segmentation-clustering
description: >
  Performs behavioral customer segmentation (K-means, mixture models), RFM analysis,
  and persona-based targeting. Use when asked to: divide customers into interpretable
  segments or personas, build clustering models for dynamic segment assignment, score
  RFM (recency-frequency-monetary), perform loyalty-monetary segmentation, build
  segment-level propensity models, or identify high-value vs churn-risk cohorts. Also
  trigger for: "customer segments", "personas", "RFM", "behavioral clustering", "K-means
  customers", "churn segments", "loyalty tiers", "segment model", "customer profiles",
  or "segment-level targeting". Always renders inline HTML dashboard as primary output.
  Includes marketer NBA and industry benchmarks via web search.
metadata:
  author: Axiom Nexar / Polanyi
  version: 1.0.0
  reference: Katsov §2.5.2 (clustering) + §3.5.3 (RFM) + §3.5.5 (persona segmentation)
---

# Customer Segmentation & Clustering

Four segmentation methods in one skill: **RFM** → **Loyalty-Monetary** → **Behavioral
Clustering (K-means/EM)** → **Segment Model** (dynamic classifier). Always outputs
dashboard first, then persona descriptions + marketer NBA.

---

## Plain Language: What This Does

```
RFM ANALYSIS:
  "Who are my best customers right now?" → Score R/F/M 1–5, sum to rank, pick top decile.
  Use for: quick wins, campaign targeting without ML infrastructure.

LOYALTY-MONETARY:
  "Who's loyal to our brand AND a heavy category spender?" → 2×2 grid segmentation.
  Use for: manufacturer-sponsored campaigns, retention vs acquisition strategy.

BEHAVIORAL CLUSTERING (K-means / Mixture Models):
  "What types of shoppers do we actually have?" → Unsupervised → interpret each cluster.
  Key rule: EXCLUDE spending/financial outcomes from features → segment on behavior cause,
  not financial result (Katsov §3.5.5 — spend is the outcome, not the driver).

SEGMENT MODEL (classifier):
  "Assign ANY new customer to a segment in real-time." → Train decision tree on cluster labels.
  Use for: personalization engines, real-time API scoring, CRM dynamic segments.
```

---

## Core Equations & Rules (Katsov §2.5.2 + §3.5.3 + §3.5.5)

### Mixture Model / EM (eq. 2.82)
```python
# Gaussian mixture model (GMM):
p(x) = Σ_{k=1}^{K} w_k * N(x | μ_k, Σ_k)

# EM algorithm alternates:
# E-step: compute posterior probability that x_i belongs to cluster k
# M-step: update w_k, μ_k, Σ_k to maximize expected log-likelihood
```

### RFM Scoring (§3.5.3)
```python
# Three metrics, each scored 1–5 by quintile:
R = recency_score    # 5 = most recent 20%, 1 = oldest 20%
F = frequency_score  # 5 = most frequent, 1 = least frequent
M = monetary_score   # 5 = highest spender, 1 = lowest

# Combined ranking: cut a corner of the RFM cube
rfm_score = R + F + M   # range 3–15
# Segment: top score = champions, bottom = hibernating

# Katsov §3.5.3 canonical scoring:
# Sort by metric → assign: top 20% → 5, next 20% → 4, ..., bottom 20% → 1
```

### Loyalty-Monetary Grid (§3.5)
```
High loyalty / High spend  → Loyalists     → Retain + reward
High loyalty / Low spend   → Devotees      → Upsell
Low loyalty  / High spend  → Switchers     → Trial offers, acquisition
Low loyalty  / Low spend   → Light users   → Low priority / winback
```

### Behavioral Feature Engineering (§3.5.5)
```python
# DO include:   category mix, channel preference, purchase day/time, brand variety,
#               promotion sensitivity, weekend vs weekday ratio, category breadth
# DO NOT include: total spend, revenue, margin  → these are outcomes, not drivers
# Katsov rule: "spending is deliberately excluded to segment on behavioral cause,
#               not the financial outcome"
```

### Segment Model (§3.5.5)
```python
# Step 1: Run clustering on historical profiles → cluster labels y
# Step 2: Train classifier f: profile_features → cluster_label
# Step 3: Score any new customer: segment = f(customer_profile)
# Typical classifier: decision tree (interpretable) or logistic regression (probabilistic)
```

---

## Katsov Example — Segment Profiles (Table 3.4)

| Metric | Seg 1 Convenience Seekers | Seg 2 Casual Buyers | Seg 3 Bargain Hunters |
|---|---|---|---|
| % of market | 20% | 50% | 30% |
| % of revenue | 40% | 40% | 20% |
| Share clothing | 40% | 60% | 60% |
| Share electronics | 50% | 20% | 10% |
| Redemption rate | 0.02 | 0.05 | 0.08 |

*Convenience Seekers: small group, 2× revenue contribution — highest LTV.*
*Bargain Hunters: 3× higher redemption — use uplift model before promoting.*

---

## Workflow

### Step 1 — RFM Analysis (quick start)
```bash
python scripts/rfm_segmentation.py \
    --transactions data/transactions.csv \
    --reference-date 2024-12-31 \
    --n-quintiles 5 \
    --output results/rfm_segments.json
```

### Step 2 — Loyalty-Monetary Grid
```bash
python scripts/loyalty_monetary_seg.py \
    --transactions data/transactions.csv \
    --brand-col brand_id \
    --output results/loyalty_monetary.json
```

### Step 3 — Behavioral Clustering
```bash
python scripts/behavioral_clustering.py \
    --profiles data/customer_profiles.csv \
    --exclude-cols "total_spend,revenue,margin" \
    --method kmeans \
    --k 4 \
    --output results/clusters.json
```

### Step 4 — Segment Model (classifier)
```bash
python scripts/segment_model.py \
    --profiles data/customer_profiles.csv \
    --clusters results/clusters.json \
    --classifier decision_tree \
    --output results/segment_model.json
```

### Step 5 — Persona Interpretation
```bash
python scripts/persona_interpret.py \
    --clusters results/clusters.json \
    --profiles data/customer_profiles.csv \
    --output results/personas.json
```

### Step 6 — Dashboard
```bash
python scripts/export_seg_dashboard_json.py \
    --rfm results/rfm_segments.json \
    --clusters results/clusters.json \
    --personas results/personas.json \
    --output dashboard_data.json
```

**Output sequence:**
```
1. [bash_tool] RFM + clustering + persona interpret
2. [web_search] Segment distribution benchmarks + RFM response rates by industry
3. [bash_tool] export_seg_dashboard_json.py → JSON
4. [show_widget] Dashboard: RFM heatmap + cluster radar + revenue waterfall + segment table
5. [text] Persona descriptions (plain language, CMO-ready)
6. [text] NBA + caveats
```

---

## Output Format — Visualization First

Dashboard panels (see `references/seg_dashboard_template.html`):
1. **KPI bar** — n segments, top segment % of revenue, RFM champion %, avg cluster silhouette
2. **RFM heatmap** — R×M matrix with customer count per cell, color = density
3. **Cluster radar** — normalized feature profiles per segment
4. **Revenue waterfall** — % of revenue by segment (Pareto)
5. **Segment table** — Katsov Table 3.4 style: persona, %, revenue %, key metrics
6. **Loyalty-monetary grid** — 2×2 with customer counts

---

## Marketer Insights Layer (MANDATORY)

### Search before benchmarking
```
web_search: "customer segmentation RFM champion response rate benchmark [industry] [year]"
web_search: "behavioral clustering segment revenue distribution [retail/ecommerce] [year]"
web_search: "customer persona marketing ROI improvement segmentation [year]"
```

### Translate Metrics to Business Language

| Technical metric | Business meaning |
|---|---|
| Cluster silhouette > 0.5 | "Segments are well-separated and interpretable" |
| RFM score 13–15 = Champions | "These customers buy often, recently, and spend most — prioritize retention" |
| RFM score 3–6 = Hibernating | "At risk of permanent churn — winback campaign window closing" |
| Loyalty-Monetary: Switchers | "Heavy category spend but buying competitors — highest acquisition ROI" |
| Segment 1 = 20% customers, 40% revenue | "This is a whale segment — 1% churn here hurts more than 10% elsewhere" |

### NBA — Next Best Actions

Always produce 5–6 specific actions:

- **Champions (RFM 13–15):** "Activate in referral program + loyalty upgrade + early access to new products. Do NOT over-promote — risk anchoring them to discounts"
- **Hibernating (RFM 3–6):** "Winback sequence: 3 emails (week 1: value recap, week 3: best offer, week 5: 'miss you' + deep discount). After week 7, suppress from paid media"
- **Switchers (low loyalty, high spend):** "Trial offer campaign — they clearly have the budget. Use look-alike model trained on recent converters from this cell"
- **Behavioral segments:** "Build separate propensity models per segment — Katsov §3.5.5: 'Customers in one segment churn because of low quality, another because of high prices'"
- **Segment model deployment:** "Deploy classifier as real-time scoring API — assign segment at login, trigger personalization engine (CRM, homepage, email subject)"
- **Revisit quarterly:** "Segment membership shifts seasonally. Rerun clustering Q1+Q3 — validate that cluster centers haven't drifted >20% from baseline"

### Connect to Business Objectives (Katsov §3.2)

| Segment | Priority objective | Action |
|---|---|---|
| Champions | Retention | Loyalty program, VIP events, referral |
| At-risk loyal | Retention | Winback before churn, save offer |
| Switchers | Acquisition | Trial offer, look-alike paid media |
| Bargain hunters | Maximization (careful) | Uplift model first — avoid over-discounting |
| Casual buyers | Maximization | Category expansion, upsell |

---

## Key Caveats

- **Behavioral features only:** Never include spend/revenue in clustering features — you'll recreate RFM, not behavior (Katsov §3.5.5).
- **K selection:** Use elbow curve + silhouette score. K=4–7 is interpretable for most retail cases. K>10 is rarely actionable.
- **EM vs K-means:** EM/GMM gives soft cluster membership (probabilities) — better for overlapping segments. K-means forces hard assignment — better for CRM tagging.
- **Segment stability:** Revalidate cluster membership 6 months post-initial run — customer behavior drifts.
- **Causality warning:** Segmentation reveals correlation, not causality. "Convenience Seekers buy electronics" ≠ "electronics drives their segment membership."
- **Privacy:** In regulated markets (GDPR/LFPDPPP MX), behavioral profiling may require consent. Anonymize cluster labels in external systems.

---

## Integration with Agency Growth OS

| Skill | Handoff |
|---|---|
| `response-uplift-modeling` | Segment labels → separate uplift models per segment |
| `customer-lifetime-value` | Segment × LTV → resource allocation per segment |
| `crm-journey-architect` | Segment → dedicated CRM journey per persona |
| `audience-segmentation-brief` | This skill produces the brief input for media activation |
| `measurement-incrementality` | Segment × lift test → incrementality by persona |

---

## Reference Files

- `references/katsov_seg_excerpts.md` — §2.5.2 + §3.5.3 + §3.5.5 equations + Table 3.4
- `references/feature_engineering_guide.md` — What to include/exclude from clustering features
- `references/industry_benchmarks_seg.md` — Segment revenue distribution benchmarks
- `references/seg_dashboard_template.html` — HTML dashboard; inject `SKILL_DATA_JSON`
