`. Check both. 6. **Firecrawl** each person's LinkedIn URL (LinkedIn is JS-gated; Jina Reader usually returns thin content). **Output:** `team/{person-slug}.md` with the appropriate `role_class` (see `schema/person.md` for the full list of 9 recognized values + disambiguation tips). All people from the firm's site go in `team/`, regardless of role_class — the file location stays uniform; the `role_class` field carries the meaning. **Note on terminology mapping.** Firms use idiosyncratic labels — Calm/Storm calls their founder-mentor ecosystem "Supporting Partners," Sequoia calls it "Scouts," Bessemer has "Operating Advisors." Pick the closest matching `role_class` from the schema; preserve the firm's literal label in `deck_role_label` and `title`. Don't spawn new role_classes for one-off labels. ### CP2 — People in the deck NOT found on the firm's site **Goal:** people named in the deck who didn't match anywhere in CP1's expanded discovery (any sub-page, any role). Resolve them via search. If you've already broadened CP1 to cover all the role-bearing sub-pages above, CP2 will be smaller than it used to be — many "advisors" turn out to live on the firm's `/support` or `/lpac` page and get caught in CP1. CP2 is now genuinely "people the firm doesn't list publicly" — usually external advisors, occasional collaborators, or people the deck mentions by reputation. **Cascade:** 1. **Tavily search** with query like `"{Name}" "{Firm Name}" advisor` or `"{Name}" "{Title from deck}"` 2. **Jina Reader** the top result for non-LinkedIn pages (personal site, news article, board-of-directors page) 3. **Firecrawl** for LinkedIn profiles (Jina Reader is thin on LinkedIn) 4. **OpenGraph.io** on the resolved profile URL for headshot 5. If no high-confidence match, write the file with `status: flagged` and `confidence: low` — never skip silently **Output:** `team/{person-slug}.md` with `role_class: advisor` (if the deck's role label suggests a formal advisory/governance role) or `external` (if no clear category). Role-label hint from the deck preserved in `deck_role_label`. ### CP3 — Portfolio companies in the deck **Goal:** for each company referenced in the deck, gather profile metadata + brand assets. **Principle: match the asset role to the render context.** A single "the logo" doesn't exist — every company has multiple brand assets at different aspect ratios for different uses. Fetching just one and forcing it everywhere produces bad layouts (a horizontal wordmark squeezed into a square chip becomes invisible-tiny-text; a square favicon stretched to a wide header looks pixelated). The skill captures **three asset roles per company by default**, then the rendering layer picks whichever fits the slot: | Role | Aspect ratio | Use case | Fetch helper | |---|---|---|---| | `trademark` (or `wordmark` + `appIcon`) | wide / horizontal | inline header, wordmark display, hero ribbon | `scripts/logo-hunt.sh` + `scripts/brandfetch.ts` | | `favicon` | square (1:1) | small chip, tile, list-row icon, OS app icon | `scripts/favicon-hunt.sh` | | `og:image` (URL only, no download by default) | 1.91:1 social card | portfolio detail-page hero, deck banner, social share preview | `scripts/og-image-hunt.sh` | The CP3 fetch loop should run all three for every portfolio company — they're cheap, they hit different paths on the company site, and they cover the most common rendering contexts a downstream deck/site will need. **Cascade:** 1. **Jina Reader** the firm's `/portfolio` (or `/companies`, `/investments`) → markdown 2. If structured fields needed, **Firecrawl** with `{name, website, sector, stage, description}[]` schema 3. **Cross-reference** with logos visually present in the deck PDF (often the deck has 6–12 portfolio logos on a "selected investments" slide) 4. **Tavily search** to fill in any company named in the deck but not on the firm's portfolio page 5. **OpenGraph.io** on each company homepage for favicon + `og:image` + description 6. **Crunchbase / LinkedIn** discovery via Tavily (`"{Co} crunchbase"`, `"{Co} linkedin"`) — fetch via Firecrawl (both are JS-gated) **Trademark / wordmark cascade.** SVG quality matters: rasters cause three recurring problems in slide / card layouts — opaque backgrounds that clash with surfaces, low-resolution that pixelates when scaled, and inconsistent margin-to-glyph ratio across brands so logos in the same-size container look visually uneven. Always try harder for an SVG before settling for a raster. Run the cascade in this order, stopping at the first SVG (or first usable raster on the final tier). Record `asset_strategy` in the company's frontmatter. 1. **Inline `

` in the nav to `.svg` and try. 3. **Brand / press kit pages** — fetch `/brand`, `/press`, `/media`, `/kit`, `/brand-assets`, `/about/press`, `/company/brand`, `/legal/brand`. These pages often link to a downloadable SVG explicitly. Jina Reader the page; grep for `.svg`. 4. **Brandfetch API** (if `BRANDFETCH_API_KEY` is set) — `https://api.brandfetch.io/v2/brands/{domain}` returns brand assets keyed by format. Prefer `format=svg`. Free tier: 1k req/mo. See `scripts/brandfetch.ts` (when added). 5. **Tavily site-search across SVG repos** — query `"{Co} logo svg" site:worldvectorlogo.com OR site:seeklogo.com OR site:vectorlogo.zone OR site:wikimedia.org OR site:upload.wikimedia.org`. These are SVG-first directories; a hit is usually clean. 6. **Google Custom Search with `fileType=svg`** (last resort, costs money) — only if you have `GOOGLE_CSE_KEY` + `GOOGLE_CSE_CX` in `~/.secrets`. Returns top URLs; download and validate it's actually an SVG. 7. **Raster fallback + background strip** — if all SVG paths fail: fetch the best raster (Brandfetch PNG > OpenGraph.io og:image > favicon), then run `scripts/bg-strip.sh` to remove the background. Set `logo_bg_stripped: true` and `asset_strategy: site-raster-stripped` (or whichever tier). **SVG validation** — after fetching anything claiming to be SVG: file must contain ` 200 bytes (filters out 1×1 tracking pixels). Strip embedded `