---
name: build-list
description: "This skill should be used when the user asks to \"build a prospect list\", \"find prospects\", \"gather leads\", \"explore targets\", or wants to build a prospect list. Collects prospect candidates via web search based on BUSINESS.md and SALES_STRATEGY.md and registers them in the DB."
argument-hint: "<project-id> [target-count=30]"
allowed-tools:
  - Bash
  - Read
  - Agent
  - WebSearch
  - WebFetch
  - mcp__plugin_leadace_api__add_prospects
  - mcp__plugin_leadace_api__check_prospect_dedup
  - mcp__plugin_leadace_api__get_outbound_targets
  - mcp__plugin_leadace_api__get_document
  - mcp__plugin_leadace_api__save_document
  - mcp__plugin_leadace_api__get_master_document
  - mcp__plugin_leadace_api__get_project_settings
---

# Build List - Prospect List Building

A skill that collects prospect candidates via web search based on the information in BUSINESS.md and SALES_STRATEGY.md, retrieves contact information, and registers them in the database.

**3-Phase Structure:**
- **Phase 1 (Candidate Collection):** Find prospect candidates broadly via web search (name, official URL, overview)
- **Phase 1.5 (Pre-dedup filter):** Call `check_prospect_dedup` with the candidates' domains and drop any the server would reject — saves the per-candidate cost of Phase 1.7 (signal WebSearch) and Phase 2 (contact-retrieval sub-agents) on already-known orgs
- **Phase 1.7 (Signal Collection):** Pull a recent-signal slice for each surviving candidate (press release / funding / hiring) so `/outbound` has fresh hooks
- **Phase 2 (Contact + Keyperson Retrieval):** Use sub-agents to explore each candidate's official site, retrieve email / form URL, AND surface at least one keyperson (job title + name)

**Before starting:** `Read` `${CLAUDE_PLUGIN_ROOT}/references/workspace-conventions.md` and follow the cross-cutting conventions there (data storage, MCP error handling, document writes, output discipline).

## Phase 1: Candidate Collection

### 1. Setup

- Project ID: `$0` (required)
- Target count: `$1` (default: 30. Approximate is fine -- "around N" is sufficient)

Load the following documents via MCP:

Call `mcp__plugin_leadace_api__get_document` with `projectId: "$0"` and `slug: "business"`.
Call `mcp__plugin_leadace_api__get_document` with `projectId: "$0"` and `slug: "sales_strategy"`.
Call `mcp__plugin_leadace_api__get_master_document` with `slug: "tpl_industries"` and keep the
returned vocabulary list — every prospect's `industry` field MUST be set to one of those exact strings.

Call `mcp__plugin_leadace_api__get_project_settings` with `projectId: "$0"` and capture:

- **`outboundChannels`** (subset of `email | form | sns_twitter | sns_linkedin`): the channels this
  project is allowed to use for outbound. Phase 2 contact retrieval should focus on the enabled
  channels — e.g. if only `email` is enabled, don't spend sub-agent effort discovering form URLs
  or SNS handles. A candidate with no contact channel matching the allowlist will be skipped at
  /outbound, so deprioritize discovering them. An empty `outboundChannels` array means the
  project has paused outbound entirely — stop and inform the user instead of building a list
  that can never be reached.
- **`targetCountries`** (array of ISO 3166-1 alpha-2 codes): when non-empty, restrict discovery
  to organizations in these countries — bias search queries with regional qualifiers, prefer
  country-specific portals, and drop candidates whose inferred country falls outside the set.
  When empty, don't constrain discovery by country — collect per the project's target market;
  recipient-country eligibility is enforced server-side at outbound time.

If either project document is not found, guide the user to run `/leadace`.

### 2. Review Search Notes

Do NOT pre-fetch the registered-prospect list. Server-side dedup in
`add_prospects` (Phase 3) is the single source of truth — it returns
structured `skippedDetails` with reasons (`email_duplicate`,
`form_url_duplicate`, `already_in_project`, `do_not_contact`,
`duplicate_in_batch`) so this skill can adapt mid-flight without an O(N)
identifier dump.

Call `mcp__plugin_leadace_api__get_document` with `projectId: "$0"` and `slug: "search_notes"`. If found, use its content. It contains knowledge from previous explorations:
- **Exhausted keywords** (do not repeat — they already returned heavy duplicates)
- **Coverage matrix** (industry × region × company-size cells already covered)
- Useful information source sites (not yet fully explored)
- Directions to try next time

Use this to continue exploration from where the last session left off. If
`search_notes` is missing, treat every cell of the matrix as unexplored.

### 3. Search Strategy

Based on the "Search Keywords" and "Target" sections of SALES_STRATEGY.md, formulate multiple search queries.

**Pick from unexplored cells of the coverage matrix first.** Each query
should belong to a single (industry × region × size) cell, e.g.
`B2B SaaS × Pacific Northwest × Series A`. Cells already marked exhausted
in `search_notes` should not be retried unless the user explicitly asks.

Avoid every keyword listed under `## Exhausted Keywords` in `search_notes`
(those previously returned ≥ 70% duplicates). Pick a synonym or different
angle instead.

Types of search queries (choose appropriate ones based on target type):
- Search by target industry + region
- Member lists of industry associations, federations
- Prospect collection from industry media and news sites
- Exhibitor lists from trade shows and events
- Client case studies from competitors
- Target exploration on job sites
- Directories or public databases of schools and corporations

### 4. Web Search Execution

Combine WebSearch and `fetch_url.py` (Jina Reader + Claude Haiku) to broadly collect prospect candidates.

**Use `fetch_url.py` for page retrieval (do not use WebFetch):**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/fetch_url.py --url "https://example.com" --prompt "Extract company list" --timeout 15
```
Has timeout control so it won't freeze on unresponsive sites. Also handles SPA sites.

**Fallback when `fetch_url.py` is unavailable**: if the invocation fails (either `python3` or the `claude` CLI is missing from PATH, or any execution error), fall back to `WebFetch` for the rest of the run. WebFetch is blocked by some corporate B2B WAFs (typically 403) — when that happens, skip the candidate and continue with the others rather than retrying.

This phase focuses on **discovering candidates**. Contact information (email, form, etc.) is collected in Phase 2, so only gather the following here:

**Required (skip the candidate if missing):**
- Name (company name, school name, organization name, etc.)
- Business overview (what the organization does; 1-2 sentences summarized from the official site)
- Official site URL

**If available:**
- Industry or field
- Department or branch name (school name for school corporations, target department for large companies)
- Country (ISO 3166-1 alpha-2, e.g., "US", "JP", "GB")
- Email addresses or SNS accounts found incidentally during search (no need to look for these intentionally)
- Organization name: the legal entity name if it differs from the prospect name (e.g., a school corporation that operates multiple schools)

Skip any prospect for which the official site URL and business overview cannot be obtained.

**Search tips:**
- A single query finds limited prospects, so vary the angles broadly
- Use portal sites and listing pages to find many candidates at once
- Stop searching once the target count (`$1`, default 30) is reached. Deduplication rejections don't count (count only newly registered ones)
- No need to deep-dive individual official sites in this phase -- focus on securing a quantity of candidates

**Duplicate-rate response (threshold-driven):**

The duplicate signal comes from two places: Phase 1.5's `check_prospect_dedup`
decisions (most candidates are caught here, before signals / contact
retrieval) and Phase 3's `add_prospects.skippedDetails` (the safety net
that catches anything that slipped past 1.5). Combine both when judging a
batch — but **exclude `plan_limit`** from the tally (it is a budget hit,
not an angle-exhaustion signal; treating it as exhaustion would mark a
perfectly good keyword as dead just because the user hit their plan cap
mid-cycle).

- **< 30% skip rate** — healthy. Continue with the same angle.
- **30–70% skip rate** — the angle is fading. Deep-dive within the same
  target *first* before pivoting:
  - Look beyond top results to page 2, 3, and beyond
  - Add regional qualifiers (e.g., "SaaS companies" → "SaaS companies
    Portland", "SaaS companies Austin")
  - Use synonyms / related terms (e.g., "consulting firm" → "advisory
    firm", "management consultancy")
  - Follow industry-specific portal sites and directories
  - Search for "competitors" / "similar services" of already-registered
    prospects to find new ones organically
- **≥ 70% skip rate** — the angle is **exhausted**. Stop deep-diving on
  this keyword / cell, record it under `## Exhausted Keywords` (step 9),
  mark the corresponding coverage-matrix cell as `exhausted`, and pivot to
  a different (industry × region × size) cell for the next pass.

The 70% rule is a hard pivot threshold, not advisory — repeating an
exhausted angle just spends quota on duplicates.

### 5. Priority and Match Reason Assessment

For each prospect, assign a match reason (why they're appropriate as a target, including their challenges and needs) and priority (1-5) based on SALES_STRATEGY.md criteria:
- 1: Top priority (perfectly matches target, needs are clear)
- 2: High priority (broadly matches target)
- 3: Standard (within target range)
- 4: Marginal (only partially meets criteria)
- 5: Under consideration (indirect possibility)

**Factor in email retrieval ease:** If the following signals are found during exploration, raise priority by 1 level for equal match quality (more email holders -> higher outbound success rate):
- Has press releases on press release distribution sites (high rate of PR contact email inclusion)
- Listed in startup DB or industry directory (more public information available)
- Email explicitly shown on official site (e.g., info@) discovered during exploration

**Note on email types:** Both named individual addresses (`first.last@co.com`) and generic addresses (`info@`, `contact@`, `sales@`, `support@`, `pr@`) are valid outreach targets. Named addresses generally have higher reply rates and deserve slightly higher priority, but **generic addresses must not be excluded** — for many companies they are the only reachable channel.

## Phase 1.5: Pre-dedup Filter

Before paying for Phase 1.7's per-candidate WebSearch and Phase 2's
per-candidate sub-agent contact retrieval, drop candidates the server
would reject anyway. The dedup decision needs only `organizationDomain`,
which is already known at the end of Phase 1, so running this gate first
saves both downstream costs.

Call `mcp__plugin_leadace_api__check_prospect_dedup` with:
- `projectId`: "$0"
- `candidates`: array of `{ organizationDomain, email?, contactFormUrl? }` —
  one entry per Phase 1 candidate. `organizationDomain` is the apex domain
  derived from the candidate's `website_url` (strip `www.` and path).
  Include `email` / `contactFormUrl` if Phase 1 happened to surface them
  (rare but possible).

The response is a `decisions` array in the same order as the input. Drop
any candidate whose `kind === 'skip'`. Tally the skip reasons (`reason ∈
already_in_project | email_duplicate | form_url_duplicate | do_not_contact
| duplicate_in_batch`) and feed that tally into step 9 (`## Exhausted
Keywords`) — the same threshold rule applies (≥ 70% skip in the batch =
exhausted angle, switch keywords for the next pass).

**If most candidates are dropped here**, the search angle is exhausted; do
not push through Phase 1.7 / Phase 2 with a near-empty list. Either
(a) re-run Phase 1 with a different keyword / region / size cell from the
coverage matrix, or (b) accept the smaller batch and continue. Phase 3's
`add_prospects` re-runs the same dedup as a safety net, so passing through
a few skip-marked candidates is harmless but wastes Phase 1.7 / Phase 2
effort.

## Phase 1.7: Signal Collection

For each **surviving (post-Phase-1.5)** candidate, run **one** WebSearch
query of the form
`"<organization name>" press release OR funding OR hiring 2025..2026` (or
your equivalent for the prospect's region / language). Skim the top
results for any of:

- A press release dated within the last 6 months
- A funding round announcement
- A hiring spike, role expansion, or new department launch
- A product launch, partnership, or named-customer announcement

When something concrete surfaces, append a `## Recent Signals` section to
the candidate's `overview` of the form:

```
## Recent Signals
- 2026-03-12: Announced Series B led by Acme Ventures (TechCrunch)
- 2026-02-04: Hiring 5 senior backend engineers (LinkedIn)
```

Bullet date + 1 sentence + source. Do not invent signals — if nothing
relevant turns up, leave the section out. `/outbound` reads `## Recent
Signals` and decides whether to open with a signal-aware hook; absent
section means no signal mention.

This is **one query per prospect**, not deep research. The SaaS-side daily
batch (B §4.2-B) refines signals over time; the goal here is to seed the
field at registration time.

## Phase 2: Contact + Keyperson Retrieval

### 6. Contact Retrieval via Sub-agents

Split the **post-Phase-1.5 candidate list** (only the `kind === 'fresh'`
entries; Phase 1.7 may have enriched their `overview` with signals) into
**batches of 5** and launch a sub-agent for each batch to retrieve contact
information.

Include the following in each sub-agent's prompt:
- List of assigned candidates (name, organization_name, website_url, overview, industry, department, country, match_reason, priority)
- Retrieve the contact enrichment procedure via `mcp__plugin_leadace_api__get_master_document` with `slug: "tpl_enrich_contacts"` and follow its procedure
- Explore each candidate's official site to retrieve email addresses and contact form URLs
- **Keyperson lookup is required**, not optional. Search the official site's
  team / leadership / about pages, then LinkedIn public results
  (`site:linkedin.com/in "<organization name>" <target role>`), then the
  press release page. Capture at least one (`contactName`, `department`)
  pair per candidate when any public source mentions one. If absolutely
  nothing surfaces, leave both null and note it.
- Use `python3 ${CLAUDE_PLUGIN_ROOT}/scripts/fetch_url.py --url <URL> --prompt <instructions>` for page retrieval (do not use WebFetch). If `fetch_url.py` cannot run (either `python3` or the `claude` CLI is missing from PATH), fall back to WebFetch and skip any candidate the WAF blocks (403)
- After completion, return the results as a JSON array

Sub-agent allowed-tools: `Bash`, `WebSearch`, `WebFetch`, `Read`, `mcp__plugin_leadace_api__get_master_document`

Each object in the JSON array returned by the sub-agent includes the Phase 1 information (name, organization_name, overview, website_url, industry, department, country, match_reason, priority) plus the retrieved contacts (email, contact_form_url, form_type, sns_accounts, contact_name).

### 6b. Re-search for Candidates Without Contact Info (only when applicable)

If Phase 2 results show candidates with both email / contact_form_url as null, try to supplement contact info from **sources other than the official site**.

For each such candidate, search WebSearch for:
- `"{company name}" email address`
- `"{company name}" contact`

Information may be found from industry directories, press release distribution sites, event speaker information, etc. If found, update the candidate's data.

**Limit:** Re-search up to a **maximum of 10 candidates** without contact info. Register the rest without contact info (they will be skipped during outbound).

## Phase 3: Registration

### 7. Database Registration

Call `mcp__plugin_leadace_api__add_prospects` with:
- `projectId`: "$0"
- `prospects`: array of prospect objects

**Field mapping for the MCP tool:**

For each prospect, construct the object as follows:
- `organizationDomain`: **Extract the apex domain from website_url** (e.g., `https://www.example.com/about` -> `example.com`). Strip `www.` prefix and path. Used for dedup.
- `organizationName`: the legal entity name (or `name` if not separately available)
- `organizationWebsiteUrl`: the organization's official website URL
- `name`: prospect name (company name, school name, department, etc.)
- `contactName`: contact person name (optional)
- `department`: department within the organization (optional)
- `overview`: business overview (1-2 sentences). If Phase 1.7 surfaced
  any signals, append the `## Recent Signals` section after the overview
  text within the same field.
- `industry`: **must be one of the strings from `tpl_industries`** (the
  vocabulary you fetched in step 1). Free-form industry strings break the
  `/evaluate` aggregator and the timing-aware ordering. If none fit, use
  `Other`.
- `country`: ISO 3166-1 alpha-2 (e.g. `US`, `CA`, `JP`). Optional in the
  payload — when omitted the server falls back to TLD inference of the
  organization domain. Set this when you have stronger evidence than the
  TLD (LLM-derived from page content, address footer, etc.) and pass
  `countrySource: 'ai_inferred'`. LeadAce currently only sends to `US`,
  `CA`, and `JP` recipients; prospects from other countries register fine
  but the send paths block them at outreach time. If the strategy already
  identified a US-, CA-, or JP-only target audience, prefer those.
- `countrySource`: optional, one of `manual` (operator confirmed) or
  `ai_inferred`. Skip this field when leaving `country` blank.
- `websiteUrl`: the specific page URL for this prospect
- `email`: email address (optional*)
- `contactFormUrl`: contact form URL (optional*)
- `formType`: one of `google_forms`, `native_html`, `wordpress_cf7`, `iframe_embed`, `with_captcha` (optional)
- `snsAccounts`: `{ x?, linkedin?, instagram?, facebook? }` (optional*)
- `matchReason`: why this prospect is a good target
- `priority`: 1-5 (default 3)
- `hypothesis`: per-prospect targeting hypothesis as a structured object (optional but recommended). Built from the assembled `overview` + any `## Recent Signals` + `matchReason` + SALES_STRATEGY context. Read by the inquiry-landing chat snapshot to ground answers about the visiting org. Shape:
  - `hypothesizedPain`: 1–3 short pain hypotheses, one sentence each (e.g. `["Manual lead routing slows reps", "No central buyer-signal aggregation"]`)
  - `valueMapping`: 1–3 bullets of how our offering addresses those pains (same order as `hypothesizedPain` when paired)
  - `timingSignals`: 1–3 concrete reasons NOW is a good moment, drawn from `## Recent Signals` (e.g. `["Series B announced 12d ago", "2 SDR roles open since 18d"]`). Omit when no signals surfaced — do not invent.
  - `targetDepartment` / `targetRolePattern`: optional. Department / role pattern most likely to buy (e.g. `"Sales Operations"`, `"Director of Sales Ops"`).
  - `bestChannel` / `bestKeyperson`: optional. Skip when unclear; do NOT guess.

  Keep each bullet to one short sentence. Skip fields when public info is too thin to fill them honestly. A partial hypothesis is fine; an invented one harms the chat AI's credibility.

\* **At least one of `email`, `contactFormUrl`, or `snsAccounts` is required.** Prospects with no contact channel are rejected.

The server automatically deduplicates by email, contact form URL, and
organization domain within the project. Inspect `skippedDetails` after the
call: each entry is `{name, reason}` with `reason ∈ email_duplicate |
form_url_duplicate | already_in_project | do_not_contact |
duplicate_in_batch | plan_limit`. If the same `reason` clusters tightly
(e.g. ≥ 50% of skips are `email_duplicate` from one industry), record the
keyword in `## Exhausted Keywords` and switch angles for the next pass.

**Difference between organizations and prospects:**
- `organizations` = **Legal entity** unit (apex domain is PK)
- `prospects` = **Prospect** unit (specific target within an organization)

Small company: organizationName = name (1:1, department is null)
School corporation operating multiple schools: organizationName = "Katayagi Gakuen School Corporation", name = "Nihon Kogakuin College" (1:many possible)
Department within large company: name = "ABC Corp.", department = "Sales Planning Dept."

### 8. Results Report

After DB registration, check reachable count:

Call `mcp__plugin_leadace_api__get_outbound_targets` with `projectId: "$0"` and `limit: 1` to get the `total` and `byChannel` summary.

Report the following:
- Number of newly registered prospects / target count
- **Reachable breakdown** (among newly registered: N with email, N with form, N SNS-only, N without contacts)
- Breakdown by priority
- Number rejected as duplicates (if many, briefly describe how the search angle was changed)
- Total project reachable remaining (from `total` field)
- Guide the user to run `/outbound` as the next step
- Append a single low-key dashboard line at the end: `Dashboard: https://app.leadace.ai/prospects` — purely informational, do not push the user to open it

### 9. Update Search Notes

Save search notes via `mcp__plugin_leadace_api__save_document` with `projectId: "$0"`, `slug: "search_notes"`. Record information useful for the next exploration in the following structure:

```markdown
# Search Notes
Last updated: YYYY-MM-DD

## Coverage Matrix
Track which (industry × region × company-size) cells have been covered
this run. Cells where the combined dedup-skip rate (Phase 1.5 + Phase 3,
excluding `plan_limit`) is ≥ 70% are marked `exhausted`. New runs should
pick from `unexplored` cells first.

| Industry | Region | Size | Status | Notes |
|---|---|---|---|---|
| B2B SaaS | US-West | Series A | covered | 12 added, 0 dups |
| B2B SaaS | US-West | Series B | exhausted | 14 dups / 18 attempts |
| HealthTech | US-Northeast | bootstrapped | unexplored | next run |

## Exhausted Keywords
Keywords whose combined dedup-skip rate (Phase 1.5 + Phase 3, excluding
`plan_limit`) was ≥ 70% this run. **Do not re-use without a fresh angle**
(different region, different size band, different role seniority). Each
entry: `keyword — reason — date`.

- "B2B SaaS Series B" — 14/18 returned `already_in_project` — 2026-05-06

## Useful Sources
- (Portal sites or listing page URLs that haven't been fully explored yet)

## Directions to Try Next Time
- (Search methods not attempted this time, regions or angles not yet explored)

## Notes
- (Areas where prospects were found unexpectedly, insights for next time)
```

If the previous version (from step 2) has a `## Hints from evaluate` section, preserve its content and carry it over to the end of the new document (to preserve response pattern info added by evaluate).

If the previous version already has `## Coverage Matrix` / `## Exhausted
Keywords` sections, **merge into them** — don't overwrite. Only mark a
cell `exhausted` when this run's data confirms it; old `exhausted` entries
should be re-tested if the user asks for a sweep across previously-skipped
cells.
