--- name: kai-seo-audit description: One-click technical SEO audit of a website. Runs the full technical SEO audit SOP — crawlability, indexation, Core Web Vitals, schema markup, internal linking, mobile UX, and content quality. Outputs a prioritized fix list. Use when "SEO audit", "technical SEO", "site audit", "crawl issues", "indexation problems", "why aren't we ranking", "SEO health check", or any request to diagnose SEO issues on a website. --- Run a technical SEO audit using the harness SOPs and checklists. Produces a prioritized fix list. ## Non-Negotiable: Kai Data Provenance Before writing any finding, load `E:\Dev2\kai-cmo-harness-work\harness\references\audit-data-provenance.md`. Declare the audit mode: - `sales_external` for public-only or prospect audits. - `onboarding_connected` when GSC, GA4, GBP, crawl exports, or SEO platform data are connected. - `internal_demo` when values are placeholders. Do not publish rankings, traffic, clicks, CTR, Core Web Vitals, PageSpeed, indexed-page counts, backlinks, Domain Rating, AI Overview visibility, schema validity, or local pack placement without source, retrieval date, and artifact/API note. Missing private data becomes a `Data needed` item, not an estimate. Run before handoff: ```bash python scripts/quality_gates/audit_provenance_lint.py workspace/seo-audit --audit-dir ``` ## Phase 0.5: Source-Backed Data Acquisition Before writing the SEO audit, run the source-backed Kai collector. The collector is shared by all Kai workflows, not audit-only; this SEO audit must consume its `audit-data.json` alias. Existing audit automations may keep using `python -m scripts.audit.collect`; non-audit SEO workflows should prefer `python -m kai.source_data.collect` and read `kai-data.json`. ```bash python -m scripts.audit.collect --url "" --mode sales_external --workflow seo-audit --out workspace/seo-audit --pagespeed --dataforseo --seo-provider auto --keywords "," --location "" ``` Use `--mode onboarding_connected` only when GSC/GA4/GBP or SEO platform exports are connected: ```bash python -m scripts.audit.collect --url "" --mode onboarding_connected --workflow seo-audit --out workspace/seo-audit --pagespeed --places --dataforseo --seo-provider auto --gsc --ga4 --keywords "," --location "" --date-from "" --date-to "" ``` Add `--third-party-sources serpapi,brightlocal,similarweb,builtwith,wappalyzer,bing-webmaster` when the SEO audit needs licensed vendor or non-Google search data. Treat API vendor values as `third_party_estimate`; treat supplied exports as `user_provided`. The audit must read SEO metrics from `workspace/seo-audit/audit-data.json`; non-audit SEO workflows can read the identical `workspace/seo-audit/kai-data.json`. If a ranking, traffic, backlink, review, PageSpeed, Core Web Vitals, schema, GSC, GA4, or local pack metric is missing there, write it as a data gap rather than estimating it. ## Phase 0: Load Product Context Check if `MARKETING.md` exists in the **project root** (same directory as CLAUDE.md, README.md, package.json). **If it exists:** Read it — skip product discovery questions. It has the product name, ICP, value prop, monetization, brand voice, current channels, and competitive landscape. **If it does NOT exist:** Auto-explore the codebase to create it in the **project root** (next to CLAUDE.md). Do NOT ask the user what the product is. Read CLAUDE.md, README.md, PROJECT.md, package.json, landing pages, and any project files. Search for email/ad/analytics config. Then create `MARKETING.md` using the template from `/kai-email-system`. Present draft to user for confirmation. --- ## Phase 1: Site Input Read from `MARKETING.md`. Only ask about things not covered there: 1. **URL** — what site are we auditing? 2. **Scope** — full site or specific sections? 3. **Known issues** — anything already flagged? 4. **Access** — do we have Search Console / analytics access? 5. **Priority** — what matters most? (rankings, traffic, indexation, speed) 6. **Audit mode** — `sales_external`, `onboarding_connected`, or `internal_demo` 7. **Data sources available** — public crawl, PageSpeed Insights, DataForSEO, Ahrefs/Semrush/Moz, GSC, GA4, GBP, Screaming Frog/Sitebulb export ## Phase 2: Audit Execution Load these before starting: - `E:\Dev2\kai-cmo-harness-work\knowledge\checklists\technical-seo-audit-sop.md` - `E:\Dev2\kai-cmo-harness-work\knowledge\checklists\technical-seo-checklist.md` - `E:\Dev2\kai-cmo-harness-work\knowledge\checklists\seo-checklist.md` ### Audit Layers (run in order) **Layer 1: Crawlability & Indexation** - robots.txt — blocking important pages? - XML sitemap — exists, submitted, up to date? - Canonical tags — correct, consistent? - Noindex/nofollow — any unintended blocks? - HTTP status codes — 404s, redirect chains, 5xx errors? - Pagination — rel=next/prev or infinite scroll handling? **Layer 2: Technical Performance** - Core Web Vitals (LCP, FID/INP, CLS) - Mobile-friendliness - Page speed (server response time, render-blocking resources) - HTTPS — mixed content, certificate issues? - Structured data / schema markup — present, valid? **Layer 3: On-Page SEO** - Title tags — unique, keyword-included, under 60 chars? - Meta descriptions — unique, compelling, under 155 chars? - H1 tags — one per page, keyword-relevant? - Image alt text — descriptive, keyword-relevant? - Internal linking — orphan pages, shallow link depth? - URL structure — clean, descriptive, flat hierarchy? **Layer 4: Content Quality** - Thin content pages (under 300 words) - Duplicate content (internal and external) - Keyword cannibalization (multiple pages targeting same keyword) - Content freshness — last updated dates - E-E-A-T signals — author bios, citations, credentials **Layer 5: Off-Page Signals** - Backlink profile overview (if data available) - Brand mentions without links - Local SEO (if applicable) — GBP, NAP consistency Use the browse/gstack skill to actually crawl pages if available. Otherwise, work from what the user provides or can check. Every check must carry source metadata: ```yaml source_tier: connected | public_observed | user_provided | inferred | missing_data source_name: "" source_url: "" retrieved_at: "" evidence_artifact: "" confidence: high | medium | low score_eligible: true | false ``` Do not include `inferred` or `missing_data` items in the health score. ## Phase 3: Prioritized Fix List Score each finding: | Priority | Impact | Effort | Examples | |----------|--------|--------|----------| | **P0** | High impact, easy fix | < 1 hour | Missing title tags, broken canonical, noindex on important pages | | **P1** | High impact, moderate effort | 1 day | CWV failures, redirect chains, thin content | | **P2** | Medium impact | 1 week | Schema markup, internal linking optimization | | **P3** | Low impact / nice-to-have | Ongoing | Alt text gaps, URL cleanup | ## Phase 4: Output ```markdown # SEO Audit Report: [site.com] Audit Mode: [sales_external/onboarding_connected/internal_demo] ## Health Score: [X]/100 ## Critical Issues (P0) | Issue | Pages Affected | Fix | |-------|---------------|-----| | ... | ... | ... | ## High Priority (P1) | Issue | Pages Affected | Fix | |-------|---------------|-----| ## Medium Priority (P2) ... ## Low Priority (P3) ... ## Technical Checklist Results - [ ] robots.txt: [PASS/FAIL — detail] - [ ] XML sitemap: [PASS/FAIL] - [ ] Canonical tags: [PASS/FAIL] - [ ] Core Web Vitals: [PASS/FAIL — LCP: Xs, INP: Xms, CLS: X] - [ ] Mobile: [PASS/FAIL] - [ ] HTTPS: [PASS/FAIL] - [ ] Schema: [PASS/FAIL] - [ ] Title tags: [PASS/FAIL] - [ ] Internal linking: [PASS/FAIL] ... ## Recommendations [Top 5 actions ordered by impact-to-effort ratio] ## Data Sources [Source inventory with retrieved_at and artifacts] ## Data Gaps [Missing access or exports that limit confidence] ``` Save to `workspace/seo-audit/[domain].md`.