---
name: bulk-data-hygiene
description: Clean and normalize your CRM data in bulk — fix phone formats, fill missing emails, standardize country codes, and flag unowned deals. Unlike board-diagnosis (which reports), this skill acts. Use when someone says "clean my CRM", "fix my data", "normalize my phone numbers", "my emails are wrong", "fill in missing owners", "clean up before my kickoff call", or "fix the data on my board".
argument-hint: "[optional: board name or ID]"
user-invocable: true
allowed-tools: [Read, AskUserQuestion, mcp__monday__get_user_context, mcp__monday__list_workspaces, mcp__monday__search, mcp__monday__get_board_info, mcp__monday__get_column_type_info, mcp__monday__get_board_items_page, mcp__monday__board_insights, mcp__monday__change_item_column_values, mcp__monday__create_doc, mcp__monday__create_update, mcp__monday__list_users_and_teams, mcp__monday__all_monday_api]
---

# Bulk Data Hygiene

Audits a CRM board for data gaps, lets the user pick which gaps to fix, then runs a bounded write loop to fix them. The "do" sibling to `board-diagnosis`'s "report." Designed for the recurring "my CRM is messy" pain — Sidekick's #1 negative-sentiment JTBD.

Flow: **Trigger → Audit → User picks gaps → Plan writes → Execute (bounded, idempotent) → Summary**.

## Input
- Optional: board name/ID via argument.
- Optional: pre-declared mode (Default / Silent / Proactive).

## Output
- **α (default):** Bounded write loop on the chosen board. Normalizes phone/country/email; can also batch-set owner / source / last-touch / stage when the user explicitly picks those gap types and signals are clean. A `Hygiene fix — <Mon DD>` doc lists every change with `Generated by Claude` footer + `<!-- claude-skill-id: bulk-data-hygiene -->`.
- **β (opt-in):** A "before/after" snapshot doc (preserves the prior state for ~30 days, in case a pass over-corrected). User is asked at Step 4.
- **Proactive extension:** Out of scope — bulk-data-hygiene is *already* a write skill; proactive doesn't change its behavior. Default mode + batched confirm is the only write surface.

## Knowledge
- Gap taxonomy + audit rules (§ Step 3).
- Normalization recipes (§ Step 5).
- Shared artifact conventions (§ Shared patterns).

## Tools (MCP)
- `get_user_context`, `list_workspaces`, `search`, `get_board_info`, `get_column_type_info`, `get_board_items_page`, `board_insights` — Gather.
- `list_users_and_teams` — enumerate users for bulk-assign owner step.
- `change_item_column_values` — the core write. Loop with bounded concurrency.
- `create_doc` / `create_update` — publish summary + before/after snapshot.
- `all_monday_api` — escape hatch for column types not exposed by `change_item_column_values`. For `country` columns, use: `change_multiple_column_values` with `column_values: {"<country_col_id>": {"countryCode": "<ISO2>", "countryName": "<name>"}}`. Do not freeform the mutation type — only this shape is supported for country columns.

## Cross-skill handoffs
- **From board-diagnosis:** the diagnosis report ends with *"run `/monday-crm:bulk-data-hygiene` to fix"* — this skill is the natural handoff target.
- **From morning-briefing:** if morning brief flags >N hygiene issues, suggest this skill.
- **To morning-briefing / forecast-dashboard:** clean data feeds them — point user there once hygiene is done.

---

## Step 0: Connector check

**Goal:** Fail fast if the user has no monday MCP connection — no partial writes, no guessing.

1. Try `mcp__monday__get_user_context`.
2. If the tool is missing, returns an auth-style error, or the user has no accessible workspace: print *"I don't see the monday connector active on this session. Install it from https://monday.com/mcp, then run this skill again."* and stop.
3. If it works, cache `user.id` + `user.name` for artifact metadata.

---

## Step 1: Detect mode

- **Default** — confirm before each batch (not each item) of writes.
- **Silent** — skip batch confirms; still publishes the summary doc.
- **Proactive** — same as Default for this skill (proactive doesn't make sense — every write is intentional). If user pre-declared Proactive, treat it as Default and note in chat.

Hard safety rail regardless of mode: **no deletes, no amount-column writes, no cross-workspace moves, no normalization on columns the user didn't pick**. Stage / owner / source / last-touch edits ARE allowed when the user explicitly selected that gap type at Step 4 — bulk-data-hygiene is the canonical place for board-wide column rewrites that other skills point users at.

---

## Step 2: Resolve board

**Goal:** Land on a single `boardId` before reading any data.

1. If the user passed a board via argument: numeric → use as `boardId` directly; string → `mcp__monday__search` with `searchType: "BOARD"`. One match → use it; multiple → `AskUserQuestion`.
2. If no argument: `get_user_context` → scan `relevantBoards` + `favorites` for names matching `deals|opportunities|pipeline|leads|sales|contacts|accounts`. One candidate → use it; multiple → `AskUserQuestion`; zero → `list_workspaces` → `search`. Still zero → stop with *"I don't see a CRM-shaped board in your workspaces. Tell me the board name or ID, or run `/monday-crm:workspace-builder` to create one."*

---

## Step 3: Audit — surface the gaps (Gather + Synthesize)

**Goal:** Compute a gap report grouped by fix-type, with a concrete count per type.

Pull all items via `get_board_items_page` (paginated up to 5000, then sample with disclosure). For each item, check:

| Gap type | Audit rule | Fixable here? |
|---|---|---|
| Missing email | `email` column empty AND contact name present AND company resolvable (see note below) | ✅ (heuristic guess from company domain — flagged with `Source = guessed`) |
| Missing phone | `phone` column empty | ❌ (no source — flag only) |
| Phone format | Phone present but no country code, or contains letters / >2 separators | ✅ (E.164 normalization) |
| Country code missing | `country` column empty AND phone has parseable country code OR email domain has CCTLD | ✅ |
| Email case | Mixed-case (`John.Doe@Acme.COM`) | ✅ (lowercase) |
| Stale last-touch | `date` last-touch >90d AND stage active | ✅ (set to "today" only if user picks; never overwrites a more-recent date) |
| Owner missing | `people` owner empty AND stage active | ✅ (bulk-assign within a user-picked group/scope) |
| Stage drift | items in non-canonical groups; user explicitly wants normalization | ✅ (bulk-set within user-picked scope) |
| Amount missing | `numbers` value empty AND stage past "qualifying" | ❌ (flag only — forecast integrity rail) |
| Source missing | `status` source column empty | ✅ (bulk-set within user-picked scope) |

**"Company resolvable" definition for email guessing:** look for (a) a `text` or `email` column containing a domain (`@` or `.`), (b) a `board_relation` column linking to an accounts board, or (c) an item name matching `First Last @ CompanyName`. If none found, mark the item as not guessable.

Resolve columns by **type**, not English name (same rule as morning-briefing). Skip gap types where the matching column doesn't exist.

Produce the audit summary in chat:

```markdown
## Audit — <board name>

**Items scanned:** N
**Gaps found:**
- ✅ Fixable (column normalization): <N> phone formats · <N> mixed-case emails · <N> missing country codes · <N> guessable missing emails
- ✅ Fixable (bulk-set within picked scope): <N> missing owners · <N> stale last-touch · <N> missing sources · <N> stage drift
- ⚠️ Flag-only: <N> missing phones (no source) · <N> missing amounts (forecast integrity rail)

Pick which fixable gaps to run, or "all". Flag-only items list in the summary doc — no writes.
```

---

## Step 4: User picks (Synthesize)

First, ask whether to create a before/after snapshot: *"Want a snapshot doc so you can see exactly what changed and roll back individual cells if something looks off? (yes / no)"*

Then `AskUserQuestion`, multiSelect:

- "Phone formats (N items)"
- "Mixed-case emails (N items)"
- "Country codes from phone/email (N items)"
- "Guess missing emails from company domain (N items) — *heuristic, may miss*"
- "Bulk-assign owner (N items) — *requires picking a scope and a target owner*"
- "Backfill stale last-touch (N items) — *will only fill empty or older-than-90d cells*"
- "Bulk-set source (N items) — *requires picking a scope and a source value*"
- "Normalize stage drift (N items) — *requires picking a scope and target stage*"
- "All column-normalization fixes" (the first four — bulk-set ones still need scope)

**Scope picker for bulk-set options:** if the user picked multiple bulk-set options (owner / source / stage / last-touch), ask once: *"Should the same scope (group / filter) apply to all of them, or do you want to set each separately?"* One question beats four. If "same for all" → ask scope once and apply it across all bulk-set options. If "separately" → run a scope sub-prompt per option.

For bulk-set options, use `list_users_and_teams` to present available owners/teams when the target owner needs to be selected.

If user picks "Guess missing emails", surface a **strong warning**: *"Email guessing uses a `<first>.<last>@<companydomain>` pattern. Accuracy varies widely — works best for B2B companies with standard email formats, unreliable for personal domains or non-Western name conventions. Only recommended for Leads boards where invalid sends are tolerable. Continue?"*

Hard rule: **no fix runs without an explicit Step 4 selection** (and a scope-picker confirm for bulk-set options). Empty selection → stop with summary-only.

---

## Step 5: Plan + execute (Publish)

**Goal:** Bounded, idempotent loop. Every write is reversible.

For each picked gap type, build a write plan: list of `(itemId, columnId, oldValue, newValue, reason)`.

### Normalization recipes

- **Phone format → E.164:**
  - Strip non-digit/`+` characters.
  - If no leading `+` AND country column is populated → prepend dial code per country.
  - If no leading `+` AND country empty AND phone starts with a known country prefix (e.g., `1` for US, `44` for UK) → prepend `+`.
  - Edge case: phone is too short (<7 digits) or too long (>15 digits) → skip with reason "unparseable".
- **Email lowercase:** `value.toLowerCase()`. Only write if changed.
- **Country code from phone:** parse leading dial code → ISO-2 country code → write to country column using `all_monday_api` with `{"countryCode": "<ISO2>", "countryName": "<name>"}` shape.
- **Country code from email CCTLD:** if email ends in `.uk` / `.de` / `.fr` etc. and country empty → infer.
- **Email guess:** for items with name + resolvable company domain, try `<firstname>.<lastname>@<domain>` lowercase. **Add a `Source = guessed` status flag** on the item so the user can filter and verify later. If a source/status column is missing, prompt the user once to add one — never write the email without a verifiability flag.
- **Bulk-set owner / source / stage:** within the user-confirmed scope (group / status filter from Step 4), apply the chosen target value via `change_item_column_values`. Never extend beyond the picked scope.
- **Backfill stale last-touch:** set to today's date only on items where last-touch is (a) empty, or (b) older than 90 days AND user explicitly picked the "stale last-touch" gap type. Never overwrite a more-recent date.

### Bounded execution

1. Cap total writes at **500 per run**. Above that, prompt: *"<N> writes planned, capped at 500 per run. Run first 500 now and re-trigger for the rest, or narrow your selection?"*
2. Execute via `change_item_column_values`, batched by item (one call per item updates multiple columns at once). Concurrency: **5 in flight max** to avoid rate limits.
3. Confirm before write (Default mode) — single batched plan:
   - *"Plan: <N> phone normalizations · <M> email lowercase · <K> country fills · <O> owner assignments (scope: <group>) · <S> source backfills (scope: <filter>) · <T> stale-last-touch fills · <D> stage drift fixes (scope: <group> → <target>). Proceed? (yes / show 3 examples per fix-type / no)."*
4. After each batch of 50 writes, print progress: `Updated 50/500 cells. Continuing...`

### Idempotency

Compute `oldValue == newValue` per cell — never write a no-op (avoids audit-log noise).

### Failure handling

- **429 rate limit:** backoff 3x (1s, 2s, 4s). On third fail, halt the loop and emit a partial summary with "<X>/<N> done; <Y> failed; re-run to continue."
- **Per-item permission error:** skip that item, log to summary, continue.
- **Per-column write error (e.g., column type changed mid-run):** skip that column on that item, log, continue.
- **User halts mid-run:** keep writes done so far, emit partial summary doc. Print the **last item ID written** so the user can pick up from there.

---

## Step 6: Publish summary doc (Publish)

`create_doc` (idempotent — search same-day docs by title + skill-id comment, update in place if found), title `Hygiene fix — <Mon DD>`. Body:

```markdown
# Hygiene fix — <board name> · <Mon DD>

<!-- claude-skill-id: bulk-data-hygiene -->

## Writes executed (<N>)
| Item | Column | Old → New | Scope / reason |
|---|---|---|---|
| ... | ... | ... | ... |

## Skipped (<N>)
| Item | Column | Reason |
|---|---|---|

## Flag-only (no writes)
- N items missing phone (no source — manual entry required)
- N items missing amount (forecast integrity rail — review manually)

## Undo
Every write here is reversible. The "Old" column above shows the prior value. To revert: find the row in monday and copy the Old value back in manually.

---
Generated by Claude · <ISO timestamp>
```

Optional β: `create_update` on a "snapshot" item with the pre-fix state of every touched item, for 30-day rollback window. Only runs if user opted in at Step 4.

---

## Step 7: Close the loop

One-line chat summary: `Cleaned <N> cells across <M> items on <board>. <K> skipped. Doc: <url>.`

If any failures: `<N> failed (rate limits / permissions). Re-run to retry — already-fixed cells are idempotent.`

---

## Shared patterns

- **No prefix on titles.** Findability via `Generated by Claude · <ISO timestamp>` footer + `<!-- claude-skill-id: bulk-data-hygiene -->` in summary/snapshot doc bodies.
- **Idempotency at the cell level** — never write `oldValue == newValue`.
- **Safety rail.** No deletes, no amount-column writes, no cross-workspace moves, no normalization on columns the user didn't pick. Owner / source / stage / last-touch edits require an explicit Step 4 selection AND a scope-picker confirm.
- **Bounded run** — 500 writes max per execution. Anything bigger gets split.
- **Concurrency cap** — 5 in flight max.
- **Type-based column resolution** for non-English boards.
- **Confirm in batches, not per-item.**

---

## Error handling reference

| Failure | Behavior |
|---|---|
| Connector missing | Step 0 stops; print install link. |
| No board found | Step 2 stops; suggest `workspace-builder`. |
| Empty selection at Step 4 | Skip writes; publish summary doc with audit only. |
| 0 fixable gaps | Print "Board's already clean — N flag-only items in summary doc." |
| 429 rate limit | Backoff 3x; halt on third with partial summary. |
| Per-item permission error | Skip, log, continue. |
| Column type changed mid-run | Skip that column on that item, continue. |
| User halt | Keep writes done; emit partial summary with last-written item ID. |
| Email-guess accuracy concern | Strong warning at Step 4; `Source = guessed` flag on every guessed write. |

---

## Completion criteria

- Step 0 (connector check) passed.
- User explicitly selected at least one gap type at Step 4 before any write.
- For bulk-set gap types (owner / source / stage / last-touch), a scope-picker confirm ran before write.
- Every write was non-no-op (idempotency).
- Total writes ≤ 500 (or user explicitly approved a multi-run plan).
- Summary doc lists every write (item, column, old→new, scope/reason) and every skip (with reason).
- Summary doc body carries `<!-- claude-skill-id: bulk-data-hygiene -->` + `Generated by Claude` footer.
- Safety rail held: no deletes, no amount-column writes, no cross-workspace moves, no writes outside the user-selected gap types or picked scope.
- Guessed emails carry a `Source = guessed` flag (or were withheld if no flag column existed).
