---
name: market-basket-analysis
description: >
  Mines frequent itemsets and association rules (Apriori, FP-Growth) to generate
  cross-selling recommendations, optimize physical store or digital shelf layout via
  the Quadratic Assignment Problem (QAP), and segment shopping journeys by purchase
  sequence. Use when asked to: find "frequently bought together" patterns, generate
  cross-sell rules, optimize aisle or category adjacency, measure product affinity,
  build basket-size uplift programs, or analyze shopping journey sequences. Also trigger
  for: "market basket", "association rules", "cross-sell", "product affinity", "lift
  matrix", "store layout", "shelf optimization", "category adjacency", "basket analysis",
  "apriori", "fp-growth", or any transaction-level data analysis. Always renders inline
  HTML dashboard + marketer NBA and industry benchmarks.
metadata:
  author: Axiom Nexar / Polanyi
  version: 1.0.0
  reference: Katsov §5.11.2 (association rules) + §6.9.1 (store layout QAP)
---

# Market Basket Analysis

Three capabilities in one skill: **association rule mining** → **QAP store/shelf layout
optimization** → **shopping journey segmentation**. Always outputs dashboard first.

---

## Plain Language: What This Does

```
ASSOCIATION RULES:   "Customers who buy pasta + wine also buy garlic (84% of the time)"
                     → Cross-sell widget on product page, bundle promotion, email trigger

STORE LAYOUT (QAP):  "Bakery and Drinks have lift=1.3 → place them adjacent"
                     → Physical store planogram, digital shelf ordering, category management

JOURNEY SEGMENTS:    "30% of baskets follow: bread→milk→eggs (quick trip)"
                     → Different CRM/loyalty messaging per journey type
```

**Core insight (Katsov §6.9.1):**
Lift > 1 between two categories = they appear together more than chance predicts.
QAP maximizes the sum of (lift × proximity) across all category pairs.

---

## Core Equations (Katsov §5.11.2 + §6.9.1)

### Association Rule Metrics (eq. 5.155–5.158)
```python
# Support — how often does this pattern appear?  (eq. 5.155–5.156)
support(X) = |{t ∈ T : X ⊆ t}| / |T|
support(X→Y) = support(X ∪ Y)

# Confidence — given X, how likely is Y?  (eq. 5.157)
confidence(X→Y) = support(X ∪ Y) / support(X) = P(Y|X)

# Lift — is this better than chance?  (eq. 6.118)
lift(X→Y) = support(X ∪ Y) / (support(X) × support(Y))
           = confidence(X→Y) / support(Y)
# lift > 1 → positive affinity   lift < 1 → negative   lift = 1 → independent

# Expected revenue from rule (eq. 5.158):
revenue(X→Y) = support(X→Y) × Σ price(i) for i in Y
```

### QAP Store Layout (eq. 6.119–6.121)
```python
# Lift matrix (eq. 6.119): L[i,j] = λ(category_i, category_j)
# Distance matrix (eq. 6.120): D[i,j] = distance between location i and j
#   (binary: 1 if adjacent, 0 otherwise — or Euclidean)

# Objective — maximize co-purchase value × proximity (eq. 6.121):
max_π  Σ_i Σ_j  λ[i,j] × D[π(i), π(j)]

# π(x) = y means: category x assigned to location y
# NP-hard → use brute force (n≤8), simulated annealing, or OR-Tools
```

### Katsov Example 6.8 — 6 Categories × 2×3 Grid
```
Lift matrix L (6.122):  highest affinities → Frozen↔Drinks (1.5), Bakery↔Dairy (1.3)
Distance matrix D (6.123): binary adjacency in 2×3 grid
Optimal layout: place Frozen next to Drinks, Bakery near Dairy
Full brute force: 6! = 720 permutations evaluated
```

---

## Algorithm Selection

| Task | Method | Script |
|---|---|---|
| Generate rules from transactions | Association rules (Katsov §5.11.2) | `association_rules.py` |
| Fast mining, large catalogs | FP-Growth | `fp_growth.py` |
| Classic textbook approach | Apriori | `apriori_optimized.py` |
| Optimize store/shelf category order | QAP (Katsov §6.9.1) | `store_layout_qap.py` |
| Segment basket journey types | Hierarchical clustering | `journey_segmentation.py` |

**When to use Apriori vs FP-Growth:**
- Apriori: ≤ 50K transactions, interpretable, slower
- FP-Growth: > 50K transactions, memory-efficient, 10–100× faster
- Both produce identical rules — FP-Growth preferred at scale

---

## Workflow

### Step 1 — Mine Association Rules
```bash
python scripts/fp_growth.py \
    --transactions data/transactions.csv \
    --min-support 0.01 \
    --min-confidence 0.2 \
    --min-lift 1.5 \
    --output results/rules.json
```

### Step 2 — Build Lift Matrix (for QAP)
```bash
python scripts/association_rules.py \
    --transactions data/transactions.csv \
    --mode lift-matrix \
    --categories data/category_map.csv \
    --output results/lift_matrix.json
```

### Step 3 — QAP Layout Optimization
```bash
python scripts/store_layout_qap.py \
    --lift-matrix results/lift_matrix.json \
    --floor-plan "2x3" \
    --method annealing \
    --output results/optimal_layout.json
```

### Step 4 — Journey Segmentation (optional)
```bash
python scripts/journey_segmentation.py \
    --transactions data/transactions.csv \
    --category-map data/category_map.csv \
    --n-clusters 5 \
    --output results/journey_segments.json
```

### Step 5 — Export Dashboard
```bash
python scripts/export_mba_dashboard_json.py \
    --rules results/rules.json \
    --layout results/optimal_layout.json \
    --output dashboard_data.json
```

**Output sequence:**
```
1. [bash_tool] Mine rules + build lift matrix + run QAP
2. [web_search] Cross-sell lift benchmarks + basket size uplift by industry
3. [bash_tool] export_mba_dashboard_json.py → JSON
4. [show_widget] Dashboard: top rules + lift heatmap + layout grid + journey segments
5. [text] CMO/marketer insights + NBA (Next Best Actions)
6. [text] Caveats: causality vs correlation, seasonality, data volume requirements
```

---

## Output Format — Visualization First

Dashboard panels (see `references/mba_dashboard_template.html`):
1. **KPI bar** — total rules found, top rule lift, avg basket size, % transactions with cross-sell opportunity
2. **Top rules table** — antecedent → consequent, support, confidence, lift, expected revenue
3. **Lift heatmap** — category × category matrix with color intensity = lift value
4. **QAP optimal layout** — grid visualization with category labels + adjacency arrows
5. **Lift distribution** — histogram of all rule lifts (identify high-value rules vs noise)
6. **Journey segments** — pie/bar of basket type distribution

---

## Marketer Insights Layer (MANDATORY)

### Search before benchmarking
```
web_search: "cross-sell association rules lift retail benchmark [year]"
web_search: "market basket analysis basket size uplift [industry] [year]"
web_search: "store layout optimization sales lift grocery [year]"
```

### Translate Metrics to Business Language

| Technical metric | Business meaning |
|---|---|
| support = 0.05 | "1 in 20 baskets contains this product combination" |
| confidence = 0.72 | "72% of customers who buy A also buy B — strong signal" |
| lift = 2.5 | "A and B appear together 2.5× more often than random chance" |
| lift = 0.8 | "Negative affinity — don't co-locate, don't bundle" |
| revenue(rule) = $4.50 | "Each triggered cross-sell recommendation earns $4.50 on average" |
| n_rules_lift>2 | "These are your high-confidence bundling/promotion opportunities" |

### NBA — Next Best Actions for Marketers/CMOs

Always produce 5–6 specific actions:

- **Cross-sell widget:** "Deploy top-5 rules (lift>2) as 'Frequently bought together' widget on PDP — target: +8–15% basket size"
- **QAP adjacency:** "Implement QAP layout in next planogram review — place top-lift category pairs adjacent; Frozen↔Drinks (1.5) is Katsov's canonical example"
- **Email trigger:** "When customer buys antecedent product, trigger cross-sell email within 24h featuring consequent product at discount — use uplift model to filter savable customers only"
- **Bundle pricing:** "High-confidence rules (conf>0.7, lift>2) → candidates for bundle pricing via two_part_tariff or price_segmentation skill"
- **Lift threshold for promotion:** "Only promote rules with lift>1.5 and support>0.02 — lower lift = noise, lower support = too rare to be worth the campaign cost"
- **Seasonality retraining:** "Retrain rules quarterly — FMCG seasonality shifts lift >20% between Q1/Q3. A beer+sunscreen rule in summer disappears in winter"

### Connect to Business Objectives

| MBA output | Objective | Activation |
|---|---|---|
| Top rules (lift>2) | Maximization — basket size | Cross-sell placement, bundle promotions |
| Negative lift pairs | Retention — reduce friction | Separate in-store to avoid confusion |
| QAP layout | Maximization — category revenue | Planogram reset, digital shelf reset |
| Journey segments | Acquisition + Retention | Personalized CRM per segment |

---

## Key Caveats

- **Correlation ≠ causation:** Beer+diapers is famous but may reflect demographics, not causality. Don't over-interpret low-lift rules.
- **Support threshold matters:** Too low → thousands of noisy rules. Too high → misses rare but valuable cross-sells. Default min_support=0.01 is a starting point.
- **QAP is NP-hard:** Brute force works for n≤8. Use simulated annealing (default) or OR-Tools for n>8.
- **Asymmetry of rules:** pasta→wine ≠ wine→pasta. Always check both directions.
- **Digital vs physical:** QAP optimizes physical adjacency. For digital shelves (eCommerce), replace distance matrix with scroll/click distance or recommendation slot position.
- **Seasonality:** Retrain at minimum quarterly — holiday seasons shift association patterns dramatically.

---

## Integration with Agency Growth OS

| Skill | Handoff |
|---|---|
| `recommender-systems` | Top association rules → non-personalized recommendation baseline (§5.11) |
| `price-demand-optimization` | High-lift pairs → bundling candidates (§6.5.4) |
| `response-uplift-modeling` | Cross-sell rules → uplift model: which customers to show the rec |
| `crm-journey-architect` | Journey segments → CRM sequence per basket type |
| `creative-supply-planner` | Top rules → define cross-sell asset requirements |

---

## Reference Files

- `references/katsov_mba_excerpts.md` — Eq. 5.155–5.158 + 6.117–6.123 + Example 6.8
- `references/algorithm_notes.md` — Apriori vs FP-Growth comparison + QAP heuristics
- `references/industry_benchmarks_mba.md` — Cross-sell lift benchmarks (SEARCH-FIRST)
- `references/mba_dashboard_template.html` — HTML dashboard; inject `SKILL_DATA_JSON`
