--- name: "site-generation" description: "End-to-end AI website generation pipeline. Claude Opus 4.7 emits Bolt-style envelopes (multi-file, plan-first) that customize Vite+React+Tailwind templates from pre-researched business data. Pre-research via APIs, media acquisition, brand extraction, visual inspection via GPT-4o, R2 upload (per-file content-type by extension), D1 status updates. Supports all business types: SaaS, portfolio, non-profit, restaurant, salon, medical, legal, retail, tech." metadata: version: "2.0.0" updated: "2026-04-30" effort: "xhigh" model: "claude-opus-4-7" context: "fork" license: "Rutgers" compatibility: claude-code: ">=2.0.0" agentskills: ">=1.0.0" submodules: - research-pipeline.md - media-acquisition.md - build-prompts.md - quality-gates.md - domain-features.md - template-system.md - local-seo.md - bolt-artifact-protocol.md - blog-import.mjs - validate-assets.mjs - build-breaking-rules.md --- # 15 -- Site Generation Submodules: research-pipeline.md (API-driven business research, scraping, enrichment), media-acquisition.md (image/video/logo sourcing across 17 engines incl. Flux 1.1 Pro Ultra, Ideogram 3.0, Recraft V3, GPT Image 1.5, Sora — Pexels-first/AI-fallback, pHash dedup), build-prompts.md (master prompt + enhancement phases), quality-gates.md (Lighthouse CI v0.15+, axe-core/playwright v4.11+ WCAG 2.2 AA, source-parity diff, 3-tier visual regression, console-error gate, Recommendations Loop), domain-features.md (category-specific features for 18+ business types), template-system.md (Vite+React+Tailwind+shadcn/ui starter, customization patterns), local-seo.md (citation building, GBP sync, review generation, trust badges, local conversion tracking), bolt-artifact-protocol.md ( XML envelope spec — ordered file/shell actions, PLAN.md-first, runtime parser+executor, anti-patterns, ~$6/site at 80K output tokens), blog-import.mjs (RSS-first crawl + Squarespace JSON fallback → strip CMS residue → GPT-4o-mini typed-block restructure → pHash dedup → src/data/blog-posts.ts emission), validate-assets.mjs (post-build R2/dist gate — 13 mandatory files + every img/link/script/source ref resolves OR matches external host allowlist). ## Dual-Template Architecture (***TWO REPOS — NEVER CONFUSE***) | Template | Repo | Use Case | Stack | |----------|------|----------|-------| | **Local Business** | `megabytespace/template.projectsites.dev` | Restaurant, salon, medical, legal, fitness, contractor, retail, etc. | Vite+React+Tailwind+shadcn/ui, 15 local components, CSS var brand slots, conversion tracking | | **SaaS** | `megabytespace/saas-starter` | SaaS products, APIs, dev tools, platforms | Hono+D1+Clerk+Stripe+Inngest+Resend on CF Workers, ESLint+Prettier | **Template selection logic:** Container entrypoint checks `_form_data.json.category`. If category ∈ {restaurant,cafe,salon,spa,medical,dental,legal,fitness,automotive,construction,photography,real_estate,education,financial,retail,pet_services,wedding,church,nonprofit,government} → clone `template.projectsites.dev`. If category ∈ {saas,api,platform,devtool,marketplace} → clone `saas-starter`. Unknown → default to local business. **Auto-sync workflow (***EVERY PROMPT***):** After ANY change to skills 05-15, evaluate: "Does this improve the template?" If yes → push to the appropriate template repo in the same prompt. Changes to: design patterns/components/CSS → `template.projectsites.dev` | API patterns/auth/billing/middleware → `saas-starter` | both → push to both. Template repos must always reflect current best practices from skills. ## Philosophy A perfect website CANNOT be created with a single LLM call. It requires a Principal SE-level prompt that orchestrates research→build→inspect→fix loops. The system front-loads ALL research and assets BEFORE Claude Code touches code, then gives it one comprehensive prompt with everything pre-digested. Claude Code starts from a pre-installed template and customizes it — never generates from scratch. **Generation protocol (***Bolt-style `` ENVELOPE — see `bolt-artifact-protocol.md`***):** The model emits ONE XML envelope containing an ordered sequence of `` and `` actions. First action is ALWAYS `PLAN.md` (route tree, design-token diff, media count, file count, validators) — auditable post-build, not a chat artifact. Replaces the legacy single-inline-HTML output (Llama 3.1 70B → 16K-token monolith) which fundamentally couldn't produce the multi-page, media-rich, blog-bearing sites the platform promises. Runtime parser (`projectsites.dev/.../services/artifact_parser.ts`) validates first-action-is-PLAN.md + required-files set + npm…build shell action; failures re-prompt the model with the error list (max 2 retries). Executor (`artifact_executor.ts`) supports two modes: `r2-files` uploads each servable file to `sites/{slug}/{version}/{path}` with content-type by extension (skips source-only files like `src/`, `package.json`, `vite.config.*`, and shell actions); `container` runs `git clone template → npm install → vite build → upload dist/` inside Cloudflare Containers (gated behind `ContainerModeNotProvisionedError` until container provisioned). Choose `container` when the artifact emits source code (the default per the new generator prompt) and `r2-files` when emitting pre-built static HTML/CSS/JS. **Quality bar:** Stripe/Linear/Vercel-level polish. Every site must be so good the business owner prefers it over their original. We don't copy — we take any website and make it dramatically better. Information-dense sites get condensed into gorgeous, well-organized modern designs with MORE useful information in FEWER, better-designed pages. ## Pipeline Overview ``` Phase 0: Pre-Research + Media Acquisition (***RUNS IN ALL BUILD MODES***) → Google Places, website scraping, social verification, brand extraction, media discovery → Download ALL images from original website (logo, photos, blog images) → Stock photos via Unsplash/Pexels/Pixabay APIs → AI-generated originals via GPT Image 1.5 / Stability AI → YouTube/Pexels video embeds → Output: _research.json, _scraped_content.json, _assets/ folder with all images → HARD GATE: <10 images in assets/ = build NOT complete Phase 1: Claude Opus 4.7 Bolt-Artifact Emission (Worker OR Container) → Reads all _ prefixed context files → Emits ONE envelope with ordered file/shell tags → First action ALWAYS PLAN.md (route tree, design tokens, media+file counts, validators) → Customizes pre-installed Vite+React+Tailwind+shadcn/ui template via file actions → Builds 1:N-page site MATCHING source sitemap (every URL recreated, max 1000); never caps at 4–8 — content-thin source = condensed (≥4 pages floor for new builds), content-rich source = full mirror → Clean URL slugs (never copy CMS garbage like -1 suffixes) → Runtime parser validates first-action-PLAN.md + REQUIRED_FILES + npm…build shell action; failures re-prompt (max 2 retries) → Executor branches: r2-files (Worker uploads pre-built static files to R2 with content-type per extension) OR container (clone template → npm install → vite build → upload dist/) Phase 2: Post-Build Verification (Worker) → Screenshot via microlink.io → GPT-4o vision scoring → D1 status update → email notification ``` ***CRITICAL: In manual/prompt-based builds (no container), Phase 0 runs INLINE as the first step of Claude Code's work. The agent MUST: (1) WebFetch original site pages and extract all image URLs, (2) curl/download images to public/, (3) search for stock photos and download, (4) generate AI images if API keys available — ALL BEFORE writing any React code. A text-only site is a failed build. See build-prompts.md "Media Acquisition" section.*** ## Single-Prompt Architecture The container (or Worker, in r2-files mode) receives ONE prompt that encompasses all build phases. The prompt references `~/.agentskills/15-site-generation/` for methodology and instructs the model to emit a single `` envelope (see `bolt-artifact-protocol.md`). Context files written to the build directory before the model runs: `_research.json`→business profile+hours+phone+address+reviews+geo (Google Places+Workers AI) | `_brand.json`→colors+fonts+personality+logo URL+color_source (brand extraction) | `_citations.json`→APA 7th ed bibliography keyed by refId for every quantitative claim (rules/citations.md) | `_scraped_content.json`→all pages from existing website by URL (scraper) | `_assets.json`→image manifest with metadata (discovery pipeline) | `_image_profiles.json`→GPT-4o analysis per image: quality+placement+colors (profiling) | `_videos.json`→YouTube/Pexels embed URLs+metadata | `_places.json`→Google Places enrichment: photos+reviews+rating | `_form_data.json`→user-submitted form data from /create | `_domain_features.json`→category-specific feature requirements (template cache) ## Build Rules (NON-NEGOTIABLE) **Images:** USE ALL images in assets/. Never use external URLs (hotlinking blocked). Hero: assets/hero-*. Gallery: full-width slider with ALL images. Service cards: relevant images. No image in assets/ left unused. ***Minimum image count scales with sitemap:*** `max(30, original_image_count × 1.4, page_count × 6_home_or_4_sub)` — 4-page rebuild ⇒ 30-50 images, 50-page ⇒ 200+, 500-page ⇒ 2000+, never cap at 4-8-page-site numbers when source has more. ***Per-page floor: home ≥6 images, every sub-page ≥4***. ***Minimum 5 AI-generated DALL-E 3 originals per site*** (hero backgrounds, service illustrations, textures, narrative scenes — Brian's stated preference: use DALL-E heavily for ultra-real photography AND creative narrative imagery). The site must feel media-rich from first scroll — no sparse pages. All images processed through optimization pipeline (skill 12 image-optimization.md): WebP+AVIF at 320/640/1280/1920w, blur placeholders, dominant color extraction. Use component — never raw with PNG/JPG src. **Dedupe via 301, not deletion:** when md5-hashed twins exist, keep canonical, delete twin, emit `{deleted-url:canonical}` to a Worker redirect map (`build-image-redirects.mjs` reference). Source-code refs use FULL canonical filenames (`pseg-feb-2023-1.jpg`), never short aliases (`pseg-1.jpg`) — the alias breaks the moment dedupe runs. **Video (***EXTRACT ALONGSIDE IMAGES — NEVER STRIP***):** Walker captures `