---
name: customer-discovery
description: >
  Discover all customers of a given company by scanning websites, case studies, review sites,
  press, social media, job postings, and more. Use when you need competitive intelligence on
  who a company sells to.
---

# Customer Discovery

Find all customers of a company by scanning multiple public data sources. Produces a deduplicated report with confidence scoring.

## Quick Start

```
Find all customers of Datadog
```

```
Who are Notion's customers? Use deep mode.
```

## Inputs

| Input | Required | Default | Description |
|-------|----------|---------|-------------|
| Company name | Yes | — | The company to research |
| Website URL | No | Auto-detected | The company's website URL |
| Depth | No | standard | `quick`, `standard`, or `deep` |

## Procedure

### Step 1: Gather Inputs

Ask the user for:
1. **Company name** (required)
2. **Company website URL** (optional — if not provided, WebSearch for it)
3. **Depth tier** — present these options, default to Standard:
   - **Quick** (~2-3 min): Website logos, case studies, G2 reviews, press search
   - **Standard** (~5-8 min): Quick + blog posts, Wayback Machine, LinkedIn, Twitter, Reddit, HN, job postings, YouTube
   - **Deep** (~10-15 min): Standard + SEC filings, podcasts, GitHub, integration directories, BuiltWith, Crunchbase

### Step 2: Create Output Directory

```bash
mkdir -p customer-discovery-[company-slug]
```

### Step 3: Run Sources for Selected Tier

Collect all results into a running list. For each customer found, record:
- **name**: Company name
- **confidence**: high / medium / low
- **source_type**: e.g., "logo_wall", "case_study", "g2_review", "press", "job_posting"
- **evidence_url**: URL where the evidence was found
- **notes**: Brief description of the evidence

#### Quick Sources

**1. Website logo wall**

Run the scrape_website_logos.py script:
```bash
python3 skills/capabilities/customer-discovery/scripts/scrape_website_logos.py \
  --url "[company-url]" --output json
```

Parse the JSON output and add each result to the customer list.

**2. Case studies page**

Use WebFetch on the company's case studies page (try `/case-studies`, `/customers`, `/resources/case-studies`). Extract customer names from page headings and content.

**3. G2/Capterra reviews**

If the `review-site-scraper` skill is available, use it to find reviewer companies:
```bash
python3 skills/capabilities/review-site-scraper/scripts/scrape_reviews.py \
  --platform g2 --url "[g2-product-url]" --max-reviews 50 --output json
```

First, WebSearch for the company's G2 page: `site:g2.com "[company]"`. Extract reviewer company names from review author info.

**4. Web search for press**

WebSearch these queries and extract customer mentions from results:
- `"[company]" customer OR "case study" OR partnership`
- `"[company]" "we use" OR "switched to" OR "chose"`

#### Standard Sources (in addition to Quick)

**5. Company blog posts**

WebSearch: `site:[company-domain] customer OR "case study" OR partnership OR "customer story"`

**6. Wayback Machine logos**

Run the scrape_wayback_logos.py script:
```bash
python3 skills/capabilities/customer-discovery/scripts/scrape_wayback_logos.py \
  --url "[company-url]" --output json
```

Logos marked `still_present: false` are especially interesting — they indicate former customers.

**7. Founder/exec LinkedIn posts**

WebSearch: `site:linkedin.com "[company]" customer OR "excited to announce" OR "welcome"`

**8. Twitter/X mentions**

WebSearch: `site:twitter.com "[company]" "we use" OR "just switched to" OR "loving"`

**9. Reddit/HN mentions**

WebSearch these queries:
- `site:reddit.com "we use [company]" OR "[company] customer"`
- `site:news.ycombinator.com "[company]" customer OR user`

**10. Job postings**

WebSearch: `"experience with [company]" site:linkedin.com/jobs OR site:greenhouse.io OR site:lever.co`

Companies requiring experience with the product are likely customers.

**11. YouTube testimonials**

WebSearch: `site:youtube.com "[company]" customer OR testimonial OR review`

#### Deep Sources (in addition to Standard)

**12. SEC filings**

WebSearch: `site:sec.gov "[company]"` — Look for mentions in 10-K and 10-Q filings.

**13. Podcast transcripts**

WebSearch: `"[company]" podcast customer OR transcript OR interview`

**14. GitHub usage signals**

WebSearch: `site:github.com "[company-package-name]"` in dependency files, package.json, requirements.txt, etc.

**15. Integration directories**

WebFetch marketplace pages where the company lists integrations:
- Salesforce AppExchange
- Zapier integrations page
- Slack App Directory
- Any marketplace relevant to the company

**16. BuiltWith detection**

```bash
python3 skills/capabilities/customer-discovery/scripts/search_builtwith.py \
  --technology "[company-slug]" --max-results 50 --output json
```

**17. Crunchbase**

WebSearch: `site:crunchbase.com "[company]" customers OR partners`

### Step 4: Deduplicate Results

Merge results by company name using fuzzy matching:
- Normalize: lowercase, strip suffixes (Inc, Corp, LLC, Ltd, Co., GmbH)
- Treat "Acme Inc" = "Acme" = "ACME Corp" = "acme.com" as the same company
- When merging, keep the highest confidence level and all evidence URLs

### Step 5: Assign Confidence

Apply these rules:

**High confidence:**
- Logo on current website (from scrape_website_logos.py with confidence "high")
- Published case study or customer story
- Direct quote or testimonial on the company's site
- Official partnership page listing

**Medium confidence:**
- G2/Capterra review (reviewer's company)
- Press article mentioning customer relationship
- Job posting requiring experience with the product
- YouTube testimonial or video review
- Logo found only in Wayback Machine (was on site, now removed)

**Low confidence:**
- Single social media mention (tweet, Reddit post)
- Indirect reference ("heard good things about X")
- BuiltWith detection only (technology on site doesn't mean they're a paying customer)
- HN discussion mention

### Step 6: Generate Report

Create two output files:

**`customer-discovery-[company]/report.md`:**

```markdown
# Customer Discovery: [Company Name]

**Date:** YYYY-MM-DD
**Depth:** quick | standard | deep
**Total customers found:** N

## High Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| Shopify | Case study | [link] |
| ... | ... | ... |

## Medium Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |

## Low Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |

## Sources Scanned

- Website logo wall: [url] — N customers found
- G2 reviews: N reviews analyzed — N companies identified
- Wayback Machine: N snapshots checked — N logos found (N removed)
- Web search: N queries — N mentions
- ...

## Methodology

This report was generated using the customer-discovery skill, which scans
public data sources to identify companies that use [Company Name]. Confidence
levels reflect the strength and directness of the evidence found.
```

**`customer-discovery-[company]/customers.csv`:**

CSV with columns: `company_name,confidence,source_type,evidence_url,notes`

Write the CSV using a code block or Python script.

## Scripts Reference

| Script | Purpose | Key flags |
|--------|---------|-----------|
| `scrape_website_logos.py` | Extract logos from current website | `--url`, `--output json\|summary` |
| `scrape_wayback_logos.py` | Find historical logos via Wayback Machine | `--url`, `--paths`, `--output json\|summary` |
| `search_builtwith.py` | BuiltWith technology detection (deep mode) | `--technology`, `--max-results`, `--output json\|summary` |

All scripts require `requests`: `pip3 install requests`

External skill scripts (use if available):
- `skills/capabilities/review-site-scraper/scripts/scrape_reviews.py` — G2/Capterra/Trustpilot reviews (requires Apify token)
- `skills/capabilities/linkedin-post-research/scripts/search_posts.py` — LinkedIn post search (requires Apify token)

## Cost

- **Quick / Standard:** Free (uses WebSearch + free APIs like Wayback Machine CDX)
- **Deep:** Mostly free. BuiltWith paid API is optional (`--api-key` flag); free scraping is used by default.
- External skills (review-site-scraper, linkedin-post-research) may require paid API tokens.
