---
name: recommender-systems
description: >
  Builds personalized product recommendation engines using content-based filtering,
  collaborative filtering (user/item neighborhood), latent factor models (SVD, SVD++, NMF),
  hybrid blending/stacking, contextual pre-filtering, and non-personalized association rules.
  Use when asked to: build a recommendation engine, personalize product suggestions, solve
  cold-start problem, rank products by affinity, improve upsell/cross-sell, compute similarity
  between items or users, evaluate recommender quality (RMSE, NDCG@K, precision@K), or combine
  multiple models. Also trigger when someone says "recommendation engine", "collaborative
  filtering", "content-based filtering", "SVD", "latent factors", "cold-start problem",
  "similar products", "frequently bought together", "next best product", "blending models",
  "NDCG", "precision@K", or pastes a user-item rating matrix. Always renders inline HTML
  dashboard + marketer NBA and industry benchmarks.
metadata:
  author: Axiom Nexar / Polanyi
  version: 1.0.0
  reference: Katsov, Introduction to Algorithmic Marketing, Ch. 5 §§5.3–5.12
---

# Recommender Systems

Personalized recommendation engine builder. Six algorithm families + hybrid blending +
evaluation suite. **Always outputs three layers:** technical model → evaluation metrics →
marketer-ready insights + NBA.

---

## How Recommendations Work — Plain Language

A recommender system answers one question: **"What should this specific customer see next?"**

Three approaches, in order of sophistication:

```
CONTENT-BASED:  "You liked X → show things similar to X"
                Uses: item descriptions, categories, attributes
                Good for: new items, cold-start users

COLLABORATIVE:  "People like you bought Y → you'll probably like Y too"
                Uses: the full rating/purchase matrix
                Good for: established users, serendipitous discovery

LATENT FACTORS: "You secretly love 'indie thriller sci-fi' even if you never said so"
                Uses: hidden patterns learned from all interactions simultaneously
                Good for: scale, accuracy, the Netflix Prize winner was this
```

**Hybrid** combines all three. **Contextual** adds time, location, device. **Association rules**
("people who buy diapers also buy beer") work without any profile.

---

## Algorithm Selection

| Scenario | Method | Script |
|---|---|---|
| New items or users (cold-start) | Content-Based KNN | `content_based_knn.py` |
| Established users, small catalog | User-Based CF | `user_based_cf.py` |
| Large catalog, scalable | Item-Based CF | `item_based_cf.py` |
| Best accuracy, Netflix-style | SVD / SVD++ | `svd_basic.py` / `svdpp.py` |
| Non-negative (interpretable) | NMF | `nmf.py` |
| Maximum accuracy (production) | Hybrid Blending | `hybrid_blending.py` |
| Day/time/location context | Contextual Pre-Filter | `contextual_prefilter.py` |
| No profile needed | Association Rules | `non_personalized_association.py` |
| Evaluate any model | Top-K Evaluator | `evaluate_top_k.py` |

→ Read `references/algorithm_selection.md` for detailed decision tree.

---

## Core Equations (Katsov Ch. 5)

### Latent Factor Model (eq. 5.92–5.96)
```
# Rating prediction from latent factors:
r̂_ui = p_u · q_i^T = Σ_{s=1}^{k}  p_us * q_is

# SVD decomposition (eq. 5.95–5.96):
R = UΣV^T  →  R̂ = U_k Σ_k V_k^T

# Full model with biases:
r̂_ui = μ + b_i + b_u + p_u · q_i^T

# Gradient descent updates:
b_u ← b_u + α(e - λ·b_u)
b_i ← b_i + α(e - λ·b_i)
p_u ← p_u + α(e·q_i - λ·p_u)
q_i ← q_i + α(e·p_u - λ·q_i)
```

### SVD++ with Implicit Feedback (eq. 5.117–5.119)
```
r̂_ui = μ + b_i + b_u + (p_u + |I_u|^{-1/2} Σ_{j∈I_u} y_j) · q_i^T
```
**Plain language:** not just what you rated, but which items you interacted with at all.

### Blending / Stacking (eq. 5.122–5.125)
```
# Linear blend of q model outputs (eq. 5.124):
r̂_ui = Σ_{k=1}^{q}  w_k · r̂_ui^{(k)}

# Ridge regression for weights (eq. 5.125):
w = (X^T X + λI)^{-1} X^T y
```

### Quality Metrics (eq. 5.5–5.11)
```
RMSE = √(1/|T| Σ e_uj²)                          [eq. 5.6]
precision@K = |Y_u(K) ∩ I_u| / K                 [eq. 5.8]
recall@K    = |Y_u(K) ∩ I_u| / |I_u|             [eq. 5.9]
DCG@K       = Σ_{i=1}^{K} (2^{r_ui}-1)/log₂(i+1) [eq. 5.10–5.11]
```

---

## Workflow

### Step 1 — Load & Inspect Data
```bash
python scripts/evaluate_top_k.py --inspect --data data/ratings.csv
```
Output: sparsity %, user/item count, rating distribution, cold-start profile.

### Step 2 — Choose & Train Model
```bash
# Content-based (cold-start safe):
python scripts/content_based_knn.py --items data/items.csv --k 20

# Collaborative filtering:
python scripts/item_based_cf.py --ratings data/ratings.csv --k 30

# Latent factors (best accuracy):
python scripts/svd_basic.py --ratings data/ratings.csv --factors 50 --epochs 20

# SVD++ (adds implicit feedback):
python scripts/svdpp.py --ratings data/ratings.csv --factors 50 --epochs 20
```

### Step 3 — Evaluate
```bash
python scripts/evaluate_top_k.py \
    --predictions results/predictions.csv \
    --test data/test_ratings.csv \
    --k 10 \
    --output results/eval_metrics.json
```

### Step 4 — Hybrid Blend (optional, +5–15% improvement)
```bash
python scripts/hybrid_blending.py \
    --model-outputs results/svd_pred.csv,results/item_cf_pred.csv \
    --test data/test_ratings.csv \
    --output results/blend_weights.json
```

### Step 5 — Export Dashboard + Insights
```bash
python scripts/export_rec_dashboard_json.py \
    --eval results/eval_metrics.json \
    --model-type svd \
    --output dashboard_data.json
```

**Output sequence:**
```
1. [bash_tool] Train + evaluate model
2. [web_search] Industry benchmarks for sector/category (precision@K, CTR lift, revenue uplift)
3. [bash_tool] export_rec_dashboard_json.py → JSON
4. [show_widget] HTML dashboard: metrics + model comparison + HR leaderboard
5. [text] CMO/marketer insights: what this means + NBA (Next Best Actions)
6. [text] Caveats: cold-start, popularity bias, filter bubble risk
```

---

## Scripts Reference

| Script | Key Inputs | Key Output |
|---|---|---|
| `content_based_knn.py` | items.csv (features), ratings.csv | Per-user top-K recs + similarity matrix |
| `user_based_cf.py` | ratings matrix | Per-user recs via Pearson/cosine similarity |
| `item_based_cf.py` | ratings matrix | Per-user recs via item-item similarity |
| `svd_basic.py` | ratings matrix, k, epochs | P/Q factor matrices + predictions |
| `svdpp.py` | ratings matrix, k, epochs | SVD++ predictions (implicit feedback) |
| `nmf.py` | ratings matrix, k | Non-negative factor matrices |
| `hybrid_blending.py` | multiple prediction CSVs | Blend weights + blended predictions |
| `contextual_prefilter.py` | ratings + context CSV | Context-sliced rating matrix |
| `non_personalized_association.py` | transactions CSV | Association rules (support, confidence, lift) |
| `evaluate_top_k.py` | predictions + test ratings | RMSE, precision@K, recall@K, NDCG@K, coverage |
| `export_rec_dashboard_json.py` | eval JSON | Dashboard-ready JSON |

---

## Output Format — Visualization First

**Primary output: inline HTML dashboard. Always render before any text.**

Dashboard panels (see `references/rec_dashboard_template.html`):
1. **KPI bar** — RMSE, precision@10, NDCG@10, catalog coverage, model type
2. **Algorithm leaderboard** — bar chart comparing models if multiple evaluated
3. **Precision-Recall curve** — varying K from 1→50
4. **Rating distribution** — original vs predicted
5. **Association rules table** — top rules by lift (for non-personalized)
6. **Business impact estimate** — revenue uplift estimate from benchmark

---

## Marketer Insights Layer (MANDATORY for every activation)

**This section must always be produced after the technical output.**

### 1. Web Search for Industry Benchmarks
Always search before responding:
```
web_search: "recommendation engine CTR uplift [industry] benchmark [year]"
web_search: "precision@10 recommender system [ecommerce/streaming/retail] industry average"
```

### 2. Translate Metrics to Business Language

| Technical metric | Business meaning | Typical industry range |
|---|---|---|
| precision@10 | "1 in X recommended products gets clicked/bought" | 5–30% (ecommerce) |
| RMSE < 0.9 | "Rating predictions are within ~1 star of truth" | Netflix benchmark: 0.85 |
| NDCG@10 > 0.6 | "The right products appear near the top of the list" | Good = 0.6–0.8 |
| coverage > 30% | "30%+ of catalog gets recommended to someone" | Prevents long-tail waste |
| HR (hit rate) | "% of sessions where at least 1 rec was relevant" | Amazon: ~35% |

### 3. NBA — Next Best Actions for Marketers/CMOs

Always produce 4–6 specific, actionable recommendations:

**Template (adapt to context):**
- **Placement:** "Deploy these recommendations in [cart page / email / home page] for highest conversion lift"
- **Cold-start:** "For new users, use content-based recs until they accumulate 5+ interactions, then switch to collaborative filtering (switching hybrid, eq. 5.121)"
- **Segmentation:** "Run separate models per [segment / category] — model accuracy is higher for homogeneous populations"
- **Testing:** "A/B test: control=current merchandising rules, treatment=this model. Measure GMV uplift over 4 weeks minimum"
- **Refresh:** "Retrain model weekly if CTR drops >15% from baseline — signals user drift"
- **Diversity:** "Cap any single brand/category at 30% of recommendation list to avoid filter bubble and improve catalog coverage"

### 4. Connect to Business Objectives

Always link output to one of: acquisition, maximization, retention (Katsov §3.2):
- **Acquisition:** content-based recs on landing pages → introduce new users to catalog
- **Maximization:** item-based CF on PDP (Product Detail Page) → cross-sell / upsell
- **Retention:** personalized email recommendations → reduce churn, increase frequency

---

## Key Caveats

- **Popularity bias:** Most CF models over-recommend popular items. Counteract with diversity re-ranking.
- **Filter bubble:** Pure collaborative filtering creates echo chambers. Mix with serendipity score (§5.3.4).
- **Cold-start:** New items/users have no interaction history → fall back to content-based (eq. 5.121 switching).
- **Implicit feedback:** Purchases ≠ preferences. A customer may have returned the item. SVD++ handles this better (eq. 5.117).
- **Sparsity threshold:** CF degrades when <1% of ratings are known. Use NMF or content-based below 0.5% sparsity.
- **Platform RMSE is not business ROI.** Always tie precision@K to revenue per recommendation slot via A/B test.

---

## Integration with Agency Growth OS

| Skill | Handoff |
|---|---|
| `audience-segmentation-brief` | Persona segments → separate rec models per persona |
| `response-uplift-modeling` | Recommendation click → treatment for uplift model |
| `crm-journey-architect` | Top-K recs → product slots in CRM email templates |
| `measurement-incrementality` | A/B test on rec engine → revenue uplift proof |
| `creative-supply-planner` | Top recommended items → prioritize creative production |

---

## Reference Files

- `references/algorithm_selection.md` — Full decision tree: sparsity × cold-start × scale
- `references/katsov_rec_excerpts.md` — Key equations §5.3–5.12 + Netflix Prize context
- `references/industry_benchmarks.md` — Sector benchmarks (search before using; may be stale)
- `references/rec_dashboard_template.html` — HTML dashboard; inject `SKILL_DATA_JSON`