---
name: design-craft
description: Produce unique designs that actually work — for websites, web apps, mobile apps (iOS/Android/visionOS), dashboards, marketing sites, brand identities, slides, posters, motion, or generative media. Use when the user wants to design, redesign, build, ship, fix, improve, refactor, polish, audit, critique, harden, brand, theme, animate, layout, type, color, accessibility, responsive, dark mode, glass, brutalist, minimalist, editorial, bento, dashboard, hero, landing, or component work — even if the user does not explicitly say "design". Also triggers on requests for premium feel, taste-driven UI, unique brand differentiation, extracting tokens from inspiration, building a design system, generating brand kits, evaluating screenshots or live URLs, producing a DESIGN.md spec, or making an existing design less generic. Do NOT use for pure backend, CLI, or non-visual work.
license: MIT
compatibility: "Platform-agnostic. Requires Python 3.9+ for bundled scripts. Optional but recommended: vision-capable LLM access for the design-evaluator subagent, screenshot capture (browser automation or pplx-tool screenshot_page), and image generation for image-first workflows. Depends on `subagent-prompt-foundry` for the parent↔subagent communication protocol."
allowed-tools: "Bash(python:*) Bash(node:*) WebFetch"
metadata:
  author: design-craft
  version: "0.1.0"
  goal: unique-designs-that-actually-work
  depends_on: subagent-prompt-foundry
  tags: [design, ui, ux, visual-identity, design-systems, brand, motion, accessibility, deterministic-evaluation, design-md]
---

# Design Craft

The end goal is **unique designs that actually work**. Two words, in equal weight.

- **Unique** — escape the statistical attractor of training-data defaults. Find unexpected pairings (industry × aesthetic × motion × type) that no AI peer would have produced for the same brief.
- **Actually work** — measured objectively against the audience and the job-to-be-done. Vision evaluation, contrast math, screenshot diffs, journey tests, structured scoring. Not vibes.

This skill is **not closed-minded**. Every "ban" in the design laws is a *prior*, not a verdict. A side-stripe border, glassmorphism, gradient text, even Inter — any of these can be the right answer for a specific audience and JTBD when the evaluator confirms it. Defaults start at lower prior probability so they need stronger evidence. The evaluator decides on the basis of measurement.

## Initial response (when no specific question is given)

> I'm ready to design something unique that actually works. Tell me what you're building, who uses it, where, and what it should make them feel. I'll search far for an unexpected pairing, build it, then verify objectively before shipping a validated DESIGN.md.

Wait for the brief before generating.

## Dependency

This skill depends on `subagent-prompt-foundry` for the parent↔subagent communication protocol. The design-evaluator subagent in `subagents/design-evaluator.md` follows the foundry's contract:

- Parent → subagent: `task_summary`, `goal`, `constraints`, `artifacts`, `allowed_tools`, `required_output`, `success_criteria`, `exclusions`.
- Subagent → parent: `task_understanding`, `assumptions`, `blocking_questions`, `output`, `evidence`, `uncertainty`, `confidence`, `next_step`.

The handoff schema is bundled at `assets/parent-subagent-message-template.json`. Install the foundry first via `npx skills add srinitude/subagent-prompt-foundry` if it is not already present, then this skill activates the design-evaluator using that protocol.

## Reference index — load only what the task triggers

This file is the only one always loaded. All others are loaded on demand.

| Reference file | Load when |
| --- | --- |
| `references/creative-thinking.md` | Starting any new design, ideating, breaking out of a default, brainstorming, or feeling stuck. **Always consulted before generating.** |
| `references/style-catalog.md` | Choosing or evaluating an aesthetic. 50+ named styles with traits, anti-traits, and lineage. |
| `references/industry-catalog.md` | Establishing audience context. 60+ industries with default conventions, their provenance, and audience constraints. |
| `references/cliche-pairings.md` | Identifying the obvious industry × style pair before deliberately rejecting or earning it. |
| `references/unexpected-pairings.md` | Constructing a pairing that is novel for the industry but coherent with the audience and JTBD. Worked examples included. |
| `references/research-and-inspiration.md` | Competitive analysis, inspiration boards, cross-pollination from other categories, design improvement against real apps. |
| `references/design-laws.md` | Color, typography, layout, motion, copy decisions. Defaults of last resort, not absolute rules. |
| `references/animation.md` | Motion, transition, micro-interaction, gesture, drag, animation, springs, perceived performance. |
| `references/components.md` | Building or reviewing component atoms — buttons, popovers, modals, tooltips, drawers, lists, inputs, toasts. |
| `references/mobile-hig.md` | iOS, iPadOS, visionOS, Android, or universal mobile interfaces. |
| `references/accessibility.md` | WCAG, contrast, keyboard nav, focus, screen readers, reduced motion, Dynamic Type. |
| `references/performance.md` | GPU acceleration, layout shift, scroll smoothness, blur scoping, animation budgets. |
| `references/user-journeys.md` | User flows, onboarding, error recovery, journey testing, e2e validation. |
| `references/design-system-extraction.md` | Reverse-engineering tokens from a public site, building tokens.json/tokens.css. |
| `references/brand-kit.md` | Generating brand identity systems — logos, marks, identity decks, palettes, presentation boards. |
| `references/generative-media.md` | Generating images, video, character continuity, cinematography, AI media direction. |
| `references/design-md-spec.md` | Producing a DESIGN.md, validating one, or diffing two versions. **Always loaded at SHIP.** |
| `references/output-discipline.md` | Long outputs at risk of truncation, code generation, ensuring no placeholder shortcuts. |
| `subagents/design-evaluator.md` | The evaluator's system prompt — used by the evaluation pipeline. **Always loaded during VERIFY.** |

**Scripts (run as needed):**

| Script | Purpose |
| --- | --- |
| `scripts/evaluate.py` | Objectively score a candidate (URL, file, image) using the design-evaluator subagent. Emits structured JSON. |
| `scripts/generate_design_md.py` | Emit a syntactically valid DESIGN.md from a structured brief. |
| `scripts/validate_design_md.py` | Validate any DESIGN.md against the spec. JSON output, non-zero exit on errors. |
| `scripts/contrast.py` | Compute WCAG AA/AAA contrast ratios for any color pair or batch. |
| `scripts/extract_palette.py` | Extract a normalized palette + WCAG report from a screenshot or token JSON. |
| `scripts/uniqueness_search.py` | Given an industry, audience, and JTBD, propose unexpected style pairings filtered for category cliché. |

**Assets:**

| File | Purpose |
| --- | --- |
| `assets/DESIGN.template.md` | Spec-compliant DESIGN.md scaffold. |
| `assets/brief.schema.json` | JSON schema for a structured shape brief. |
| `assets/evaluation.schema.json` | JSON schema the evaluator returns. |
| `assets/parent-subagent-message-template.json` | The subagent-prompt-foundry handoff schema. |

## The seven-phase pipeline

Every non-trivial design task runs in this order. Each phase produces an artifact the next phase consumes.

```text
1. THINK     →  Creative-thinking frameworks. Output: 3-sentence note + at least 3 negated defaults.
2. RESEARCH  →  Map industry, audience, JTBD, obvious pairing space. Output: cliché register + audience constraints.
3. SHAPE     →  Forge an unexpected-but-coherent pairing. Output: written brief with register, anti-references, fit hypothesis.
4. BUILD     →  Implement against design laws + chosen aesthetic + domain references. Output: artifact (code, image, mockup).
5. VERIFY    →  Run the deterministic evaluator. Output: structured findings JSON with PASS/WARN/FAIL per axis.
6. ITERATE   →  Address every FAIL, surface every WARN. Re-VERIFY until PASS or one explicit user override per WARN.
7. SHIP      →  Polish, generate DESIGN.md, validate it, hand off with the evaluator JSON attached.
```

Skip a phase only when the user explicitly says so, and record that in the SHIP block.

---

## Phase 1 — THINK

Before any visual generation, run creative thinking. The most common failure mode is jumping to defaults: centered hero on dark background, three equal cards, purple-blue AI gradient, Inter at every size. Defaults are statistical attractors, not decisions.

Open `references/creative-thinking.md`. The minimum protocol — run on every task:

1. **Negate three defaults.** Name each. Negate each. Decide whether the negation is coherent for the audience.
2. **Reframe.** Restate the user's request once with one constraint changed (objective, formalism, granularity, agent, timescale, or direction).
3. **Cross-pollinate.** Name one structurally distant reference (different category, era, or medium) whose mechanism could transfer.
4. **Run the category-reflex check.** First-order: would a stranger guess your palette and theme from the category alone? Second-order: would they guess the aesthetic family from category + anti-references? If yes to either, return to step 1.

Record inline as a 3-sentence note. **Hold contradictions before resolving.** Density AND breathability. Restraint AND personality. Novelty AND feasibility. These tensions usually produce better synthesis than choosing a side.

---

## Phase 2 — RESEARCH

Map the space before forging the pairing. Three artifacts to produce — write them inline, briefly.

### 2.1 Identify the industry context

Open `references/industry-catalog.md`. Find the entry that matches. Record:

- The industry's **default visual conventions** (palette, type, motion, density).
- The **provenance** of those defaults — where they come from historically (fintech navy = banking print history; observability dark blue = Splunk/Datadog precedent; healthcare teal = ADA hospital signage).
- The **audience constraints** that are real (regulatory, cognitive, environmental, device) versus the conventions that are arbitrary historical accidents.

### 2.2 Identify the obvious pairing

Open `references/cliche-pairings.md`. Name the obvious industry × aesthetic pair every competitor uses. **You are not banning it.** You are surfacing it so the next phase can either deliberately reject it (most cases) or deliberately earn it (rare cases where audience evidence supports the convention).

### 2.3 Identify what the audience actually needs

Write down, briefly:

- **Physical needs.** High contrast outdoors, large type for older users, density for traders.
- **Emotional needs.** Confidence for medical decisions, energy for entertainment, calm for mental health.
- **Environment.** Dim room, sunlight, multi-monitor, subway commute.
- **Devices.** Mid-range Android, iPad Pro, 5K Studio Display.
- **Cultural references.** Bloomberg terminal for traders, magazine spreads for fashion buyers, video games for Gen-Z.

These are the real fitness function. They decide PASS/FAIL in VERIFY.

---

## Phase 3 — SHAPE

Forge an unexpected pairing that actually works. Open `references/unexpected-pairings.md` for the construction method.

### 3.1 Generate candidate pairings

Run the search:

```bash
python scripts/uniqueness_search.py \
  --industry <industry> \
  --audience-constraints assets/audience.json \
  --jtbd "<one-sentence job to be done>" \
  --num-candidates 5
```

The script:

1. Loads the cliché set for that industry from `references/cliche-pairings.md`.
2. Returns 3–5 pairings drawn from `references/style-catalog.md` that are NOT the cliché.
3. Filters for at least one mechanism that transfers to the audience constraints.
4. Pulls each candidate from a structurally distant source (different industry, era, medium, or culture).

### 3.2 Pick one candidate and predict its fit

For the chosen pairing, write a **fit hypothesis**: 2–3 sentences predicting how it satisfies each audience constraint from 2.3. The evaluator will test these claims in VERIFY.

### 3.3 Write the one-paragraph shape brief

Cover: who uses it, where, in what light, in what mood, doing what. The sentence must force "light or dark mode". "SRE glancing at incident severity on a 27-inch monitor at 2am in a dim room" forces dark. "Observability dashboard" does not. Add detail until the brief decides.

Also include:

- **Anti-references.** What this design is explicitly NOT. ("Not Linear-style dark blue + Inter. Not Splunk-style green-on-black.")
- **Register.** Brand (design IS the product) or product (design SERVES the product).
- **The single most important downstream rule.** One sentence the implementer keeps on a sticky note.

### 3.4 Confirm before building (when stakes warrant)

For substantial work, restate the brief and the fit hypothesis, then wait for confirmation. Quick edits skip this gate.

---

## Phase 4 — BUILD

Generate against three layers, in order. None override the others; together they form the artifact.

### 4.1 Shared design laws

Open `references/design-laws.md`. These are defaults of last resort — guidance the audience evidence has not specifically displaced. Apply universally unless the SHAPE brief gave a documented reason to override (and document the override).

### 4.2 Aesthetic register

Apply the rules of the unexpected pairing chosen in SHAPE. Open `references/style-catalog.md` for the full vocabulary of any named style. Do not silently mix two registers.

### 4.3 Domain references

Load only what applies to the artifact:

- Animation, gesture, drag, motion → `references/animation.md`.
- Component atoms → `references/components.md`.
- iOS/iPadOS/visionOS/Android → `references/mobile-hig.md`.
- Accessibility, contrast, keyboard, screen readers, reduced motion → `references/accessibility.md`.
- Performance → `references/performance.md`.
- User flows, onboarding, journey validation → `references/user-journeys.md`.
- Reverse-engineering tokens from a real site → `references/design-system-extraction.md`.
- Brand identity boards → `references/brand-kit.md`.
- Generative media (images, video, characters, cinematography) → `references/generative-media.md`.

### 4.4 Image-first when visuals matter

When the request is mainly visual (heroes, landing pages, redesigns, brand kits), generate reference images first when image generation is available, deeply analyze them (text, type scale, spacing, palette, components), then implement. See `references/generative-media.md` for the full image-first protocol.

---

## Phase 5 — VERIFY (deterministic, not vibes)

This converts taste into evidence. Default presumption: **suboptimal until evidence proves otherwise** — including for the bans listed in the design laws.

### 5.1 Load the design-evaluator

Open `subagents/design-evaluator.md`. This is the system prompt the evaluator runs under, whether called as a subagent (via the foundry protocol) or as a script.

### 5.2 Run the evaluator

```bash
python scripts/evaluate.py \
  --target <url-or-file-or-image-path> \
  --brief assets/brief.json \
  --register <brand|product|image|video> \
  --output evaluation.json
```

The script:

1. **Captures evidence.** Calls a screenshot tool for URLs (browser automation, `pplx-tool screenshot_page`, or any equivalent), ffmpeg keyframes for videos, native files for images, source-of-truth JSON for tokens. Falls back to provided artifacts when no capture tool is available.
2. **Computes objective measurements.** WCAG contrast across declared component pairs (`scripts/contrast.py`), tap-target sizes from rendered DOM, motion timing budgets, text overflow detection, layout-shift detection on scroll, image diff against reference frames.
3. **Invokes a vision-capable LLM** with the design-evaluator system prompt to score the seven evaluation axes and to test each fit-hypothesis claim from SHAPE 3.2. Communication uses the subagent-prompt-foundry parent↔subagent contract.
4. **Emits structured JSON** conforming to `assets/evaluation.schema.json`.

If a screenshot tool or vision LLM is unavailable, the script degrades gracefully: still computes objective measurements, runs structural code audits, and labels vision-required findings as `info` rather than `pass`/`fail`. The output records what was skipped so the user can decide whether to override.

### 5.3 The seven evaluation axes

| Axis | Question |
| --- | --- |
| **1. Audience fit** | Do the audience constraints from RESEARCH 2.3 hold against the artifact? Each constraint is a separate sub-finding. |
| **2. JTBD fit** | Does the artifact let the user accomplish the job-to-be-done in fewer steps and with less friction than the cliché baseline? |
| **3. Uniqueness** | Could a stranger guess the industry from the artifact alone? Could they guess the aesthetic family from industry + anti-references? Both should be no. |
| **4. Craft** | Color (OKLCH discipline, contrast, no #000/#fff unless justified), type (hierarchy, line length, tabular figures), layout (rhythm, alignment, mobile collapse), motion (purpose, easing, duration), copy (specific, no clichés). |
| **5. Accessibility** | WCAG AA contrast, focus visibility, keyboard reachability, reduced-motion respect, Dynamic Type, alt text. |
| **6. Performance** | GPU-only animation, no `h-screen`, blur scoped to fixed/sticky, no main-thread scroll listeners, image dimensions declared, lazy below fold. |
| **7. Output discipline** | No placeholder code, no truncation, every requested deliverable present, every claimed token actually exists in the artifact. |

### 5.4 The objective override

A "ban" in `references/design-laws.md` is a prior, not a verdict. If a candidate violates a ban but the evaluator returns:

- **PASS on Audience fit** for every constraint, AND
- **PASS on JTBD fit**, AND
- **PASS on Craft, Accessibility, and Performance**, AND
- **Uniqueness improves** versus the cliché baseline, AND
- **Fit-hypothesis claims from SHAPE 3.2 are upheld by vision evidence**

…then the ban does not apply for this artifact. Document the override in the SHIP block with the evaluator JSON attached. The override is per-artifact, never global.

### 5.5 Cliché overrides

The reverse also holds. If RESEARCH 2.3 surfaces audience constraints that the cliché actually satisfies (regulated banking products where navy + gold lowers cognitive load on older users, e-ink readers where pure white background is correct, OLED battery-savings where pure black is correct), and the evaluator confirms the cliché beats every unexpected candidate on Audience and JTBD fit, ship the cliché. Document why. The goal is unique designs **that actually work**, not unique-at-all-costs.

---

## Phase 6 — ITERATE

Address every FAIL. Surface every WARN. Re-run VERIFY until either:

- Every axis is PASS, or
- The user explicitly overrides each remaining WARN with a recorded reason. (Not a generic "ship it" — one reason per WARN.)

Each iteration is small and scoped. Do not rewrite from scratch in response to a single WARN.

---

## Phase 7 — SHIP

Polish, package, generate DESIGN.md, validate it, hand off.

### 7.1 Polish pass

Apply final-pass corrections from `references/design-laws.md`: button alignment, optical centering of icons, consistent baseline alignment of side-by-side cards, sensible `letter-spacing`, `text-wrap: balance` on orphans, no dead `#` links, branded favicon, proper meta tags.

### 7.2 Generate DESIGN.md

Every shipped interface or system gets a `DESIGN.md`. Open `references/design-md-spec.md` for the format.

```bash
python scripts/generate_design_md.py \
  --name "Heritage" \
  --register brand \
  --brief assets/brief.json \
  --evaluation evaluation.json \
  --output DESIGN.md
```

Sections in spec order: Overview → Colors → Typography → Layout → Elevation & Depth → Shapes → Components → Do's and Don'ts. The Overview embeds the SHAPE brief, the unexpected-pairing rationale, and the fit hypothesis. The Do's and Don'ts embed any documented overrides from VERIFY 5.4 so the next agent reading the DESIGN.md inherits the reasoning.

### 7.3 Validate DESIGN.md

```bash
python scripts/validate_design_md.py DESIGN.md
```

Checks: YAML front matter parses; `name` non-empty; `colors.primary` exists; every color is a valid sRGB hex; typography entries include `fontFamily` and `fontSize`; dimensions use `px`/`em`/`rem`; token references resolve; section order matches the spec; no duplicate section headings; component property names within the allowed set; contrast ratios on declared component pairs reported.

If errors: fix and re-run before ending the task. If warnings: surface to the user, allow the file to ship.

### 7.4 Hand-off block

End the response with:

- The DESIGN.md path.
- The evaluator JSON summary (per-axis status, override reasons).
- A two-line summary: register, unexpected pairing, and the single most important downstream rule.

---

## Universal output rules

- Use markdown tables for any Before/After review. Never bullet lists with "Before:"/"After:" on separate lines.
- No em dashes in the work itself unless the spec or copy already uses them.
- Default away from **Inter, Roboto, Helvetica, Open Sans, Arial**. Use Geist, Outfit, Cabinet Grotesk, Satoshi, Public Sans, Geist Mono, or platform-native (SF Pro, Roboto-on-Android) as context demands. Inter is permitted when the evaluator confirms it satisfies Uniqueness and Craft for the specific brief.
- Default away from **#000000** and **#ffffff**. Tint neutrals toward the brand hue (chroma 0.005–0.01). Pure black/white is permitted when the evaluator confirms an audience constraint (e-ink, OLED savings, print fidelity).
- Default to animating only **`transform`** and **`opacity`**. Layout-property animation requires evaluator evidence that no GPU alternative exists.
- Default to **`min-h-[100dvh]`** over `h-screen`.
- Default to scoping `backdrop-blur` to fixed/sticky elements only.
- No emojis in code, markup, alt text, or generated UI text. Use SVG icons (Phosphor, Radix, Lucide as last resort) or clean primitives.
- No generic placeholder names ("John Doe", "Acme Corp", "Lorem Ipsum"). Use believable, organic data.
- No AI copywriting clichés ("Elevate", "Seamless", "Unleash", "Next-Gen", "Game-changer", "Delve", "Tapestry"). Write plain, specific language.
- Every screen has an obvious back/dismiss path.

## Pre-output text checklist (every visual artifact)

- Text wraps cleanly; no broken words or mid-word line breaks.
- No text overflow or truncation in titles, buttons, or headers.
- Foreground/background contrast meets WCAG AA on every declared pair (especially dark text on dark headers — extremely common evaluator failure).
- Numbers use tabular figures in data-heavy contexts.

The evaluator checks these automatically when given a screenshot. If you produce visual output without running the evaluator, run this checklist by eye before sending.

## Closing principle

Beauty is leverage, but only when it solves the user's actual problem. The aggregate of invisible correctness creates interfaces people love without knowing why. The aggregate of measured correctness creates interfaces that work whether or not they are loved. This skill aims for both: unique enough that no peer would have produced the same artifact, evidence-backed enough that you can prove it.
