---
name: sanitize
description: >-
  Redact secrets and customer-identifying data from support notes, logs, and
  configs (HCL/JSON/YAML/.env) before they are shared with an external AI model
  or pasted outside the local machine. Use this BEFORE sending logs, stack
  traces, config files, or support-case text to any external service. Swaps
  sensitive values (tokens, AWS/Vault creds, private/public IPs, hostnames,
  emails, customer names) for stable typed placeholders like <PRIVATE_IP_1>,
  <HOST_1>, <REDACTED_VAULT_TOKEN_1> so debugging context survives.
---

# sanitize — local support-text sanitizer

A local-first, zero-dependency CLI that strips secrets and customer-identifying
data out of support notes, logs, and configs, replacing them with **stable,
typed placeholders** instead of a blanket `<REDACTED>`. Topology and
relationships are preserved (the same IP always maps to the same placeholder in
a run), so the text stays useful for debugging while the sensitive bits are gone.

```text
10.42.1.15 failed to connect to 10.42.1.16 on port 4647
        ↓
<PRIVATE_IP_1> failed to connect to <PRIVATE_IP_2> on port 4647
```

## When to use this skill

Reach for this **proactively, before** any of the following:

- Pasting or sending logs, stack traces, config files (HCL/JSON/YAML/`.env`),
  or support-case notes to an external AI model or any service off this machine.
- Quoting customer-provided artifacts (Nomad/Consul/Vault logs, allocation
  output, agent configs) into a chat, ticket, or PR that leaves the local box.

Also use it on **explicit request** — e.g. "sanitize this", "redact this log",
"is this safe to paste?".

When you are about to include externally-sourced text in a message that leaves
the local environment, sanitize it first and use the sanitized version. If you
are unsure whether content is safe, run it through `strict` and show the report.

## How to run it

The sanitizer is bundled inside this skill (pure Python 3.11+, no install). Call
the wrapper with an absolute path so it works from any working directory:

```bash
python3 "<SKILL_DIR>/run.py" INPUT [options]
```

`<SKILL_DIR>` is the directory containing this SKILL.md. Sanitized text goes to
**stdout**; reports go to **stderr**, so redirection stays clean.

Common invocations:

```bash
# Sanitize a file to stdout
python3 "<SKILL_DIR>/run.py" alloc.log

# Pipe text in (e.g. content you were about to paste)
printf '%s' "$TEXT" | python3 "<SKILL_DIR>/run.py" --report

# Choose a profile and name the customer to redact
python3 "<SKILL_DIR>/run.py" case-notes.txt --profile case-summary --customer "Acme Corp"

# Write the clean text to a file; report prints to the terminal
python3 "<SKILL_DIR>/run.py" notes.txt --report > clean.txt
```

### Picking a profile

| Profile | Use when… | Notably preserves |
| --- | --- | --- |
| `strict` | Unsure the content is safe. Redacts the most (incl. URLs, UUIDs, phones). | Ports, timestamps, error text. |
| `infra-safe` *(default)* | You want infra/debugging context intact. | Timestamps, ports, protocols, status codes, **cloud regions**, generic service names. |
| `case-summary` | Pasting a case narrative into an AI assistant. | **Product names** (Nomad, Consul, Vault, Terraform…), versions, the technical narrative. |

For infrastructure debugging (e.g. Nomad/Consul/Vault logs and configs),
`infra-safe` is usually the right default — it keeps the technical signal while
removing secrets, IPs, hostnames, and customer identity.

### Useful flags

| Flag | Purpose |
| --- | --- |
| `-o, --output FILE` | Write sanitized text to a file instead of stdout. |
| `--profile NAME` | `strict`, `infra-safe` (default), or `case-summary`. |
| `--report` | Count summary of what was redacted → stderr. |
| `--json-report` | Machine-readable summary → stderr. |
| `--customer NAME` | Literal org/customer name to redact as `<ORG_n>` (repeatable). |
| `--allow TERM` | Term to never treat as a hostname (repeatable). |
| `--config FILE` | Path to a `.sanitizer.yml` (else auto-discovered up the tree). |
| `--no-config` | Skip `.sanitizer.yml` auto-discovery. |
| `--debug` | Include original values in the report (local only; off by default). |

## After sanitizing

1. Use the **sanitized** text downstream, never the original.
2. If you ran `--report`, briefly tell the user what was redacted (counts by
   type), so they can confirm nothing important was lost or missed.
3. **State the caveat plainly:** this is a regex/heuristic tool, not a guarantee.
   It can miss novel secret formats and can occasionally over-redact. The user
   must review the output before sharing it externally.

## Configuration

A `.sanitizer.yml` in the repo (or up the tree from the input) is auto-loaded.
It can add `customer_names`, an `allowlist` of non-hostnames, and custom
`extra_patterns`. See `examples/sanitizer.yml.example`.

## Examples / reference

The `examples/` folder beside this file has realistic before-text to try:
`nomad-client.log`, `nomad-scheduler.log`, `consul-agent.hcl`, `vault-config.hcl`,
`case-notes.txt`. Good for verifying the skill works:

```bash
python3 "<SKILL_DIR>/run.py" "<SKILL_DIR>/examples/nomad-client.log" --report
```