---
name: harshjudge
description: AI-native E2E testing orchestration for Claude Code. Use when creating, running, or managing end-to-end test scenarios with visual evidence capture. Activates for tasks involving E2E tests, browser automation testing, test scenario creation, test execution with screenshots, or checking test status.
---

# HarshJudge E2E Testing

AI-native E2E testing with MCP tools and visual evidence capture.

## Core Principles

1. **Evidence First**: Screenshot before and after every action
2. **Fail Fast**: Stop on error, report with context
3. **Complete Runs**: Always call `completeRun`, even on failure
4. **Step Isolation**: Each step executes in its own spawned agent for token efficiency
5. **Knowledge Accumulation**: Learnings go to `prd.md`, not scenarios

## Step-Based Execution

HarshJudge uses a **step-based agent pattern** for token-efficient test execution:

```
Main Agent                    Step Agents (spawned per step)
    │
    ├─ startRun(scenarioSlug)
    │      ↓
    │  Returns: runId, steps[]
    │
    ├─► Spawn Agent: Step 01 ──────────────────────► Execute actions
    │      │                                              │
    │      │ ◄─────────────────────────────────── Return: { status, evidencePaths }
    │      │
    │   completeStep(runId, "01", status)
    │      │
    ├─► Spawn Agent: Step 02 ──────────────────────► Execute actions
    │      │                                              │
    │      │ ◄─────────────────────────────────── Return: { status, evidencePaths }
    │      │
    │   completeStep(runId, "02", status)
    │      │
    │   ... (repeat for each step)
    │
    └─ completeRun(runId, finalStatus)
```

**Benefits:**
- Each step agent has isolated context (no token accumulation)
- Large outputs (screenshots, logs) saved to files, not returned
- Main agent only receives concise summaries
- Automatic token optimization without manual management

## Workflows

| Intent | Reference | Key Tools |
|--------|-----------|-----------|
| Initialize project | [references/setup.md](references/setup.md) | `initProject` |
| Create scenario | [references/create.md](references/create.md) | `createScenario` |
| Run scenario | [references/run.md](references/run.md) | `startRun`, `completeStep`, `completeRun` |
| Fix failed test | [references/iterate.md](references/iterate.md) | `getStatus`, `createScenario` |
| Check status | [references/status.md](references/status.md) | `getStatus` |

## Project Structure

```
.harshJudge/
  config.yaml              # Project configuration
  prd.md                   # Product requirements (from assets/prd.md template)
  scenarios/{slug}/
    meta.yaml              # Scenario definition + run statistics
    steps/                 # Individual step files
      01-step-slug.md      # Step 01 details
      02-step-slug.md      # Step 02 details
      ...
    runs/{runId}/          # Run history
      result.json          # Run result with per-step data
      step-01/evidence/    # Step 01 evidence
      step-02/evidence/    # Step 02 evidence
      ...
  snapshots/               # Inspection tool outputs (token-saving pattern)
```

## Quick Reference

### HarshJudge MCP Tools

| Tool | Purpose |
|------|---------|
| `initProject` | Initialize project (spawns dashboard) |
| `createScenario` | Create/update scenario with step files |
| `toggleStar` | Toggle/set scenario starred status |
| `startRun` | Start test run, returns step list |
| `recordEvidence` | Capture evidence for a step |
| `completeStep` | Complete a step, get next step ID |
| `completeRun` | Finalize run with status |
| `getStatus` | Check project or scenario status |
| `openDashboard` / `closeDashboard` | Manage dashboard server |

### Playwright MCP Tools

| Tool | Purpose |
|------|---------|
| `browser_navigate` | Navigate to URL |
| `browser_snapshot` | Get accessibility tree (use before click/type) |
| `browser_click` | Click element using ref |
| `browser_type` | Type into input using ref |
| `browser_take_screenshot` | Capture screenshot for evidence |
| `browser_console_messages` | Get console logs |
| `browser_network_requests` | Get network activity |
| `browser_wait_for` | Wait for text/condition |

## Step Agent Prompt Template

When spawning an agent for each step:

```
Execute step {stepId} of scenario {scenarioSlug}:

## Step Content
{content from steps/{stepId}-{slug}.md}

## Project Context
Base URL: {from config.yaml}
Auth: {from prd.md if needed}

## Previous Step
Status: {pass|fail|first step}

## Your Task
1. Execute the actions using Playwright MCP tools
2. Use browser_snapshot before clicking to get element refs
3. Capture before/after screenshots using browser_take_screenshot
4. Record evidence using recordEvidence with step={stepNumber}

Return ONLY a JSON object:
{
  "status": "pass" | "fail",
  "evidencePaths": ["path1.png", "path2.png"],
  "error": null | "error message"
}

DO NOT return full evidence content. DO NOT explain your work.
```

## Error Handling

On ANY error:
1. **STOP** - Do not proceed
2. **Report** - Tool, params, error, resolution
3. **Check prd.md** - Is this a known pattern?
4. **Do NOT retry** - Unless user instructs