---
name: dyad:deflake-e2e
description: Identify and fix flaky E2E tests by running them repeatedly and investigating failures.
---

# Deflake E2E Tests

Identify and fix flaky E2E tests by running them repeatedly and investigating failures.

## Arguments

- `$ARGUMENTS`: (Optional) Specific E2E test file(s) to deflake (e.g., `main.spec.ts` or `e2e-tests/main.spec.ts`). If not provided, will prompt to deflake the entire test suite.

## Instructions

1. **Check if specific tests are provided:**

   If `$ARGUMENTS` is empty or not provided, ask the user:

   > "No specific tests provided. Do you want to deflake the entire E2E test suite? This can take a very long time as each test will be run 10 times."

   Wait for user confirmation before proceeding. If they decline, ask them to provide specific test files.

2. **Install dependencies:**

   ```
   npm install
   ```

3. **Build the app binary:**

   ```
   npm run build
   ```

   **IMPORTANT:** This step is required before running E2E tests. E2E tests run against the built binary. If you make any changes to application code (anything outside of `e2e-tests/`), you MUST re-run `npm run build` before running E2E tests again, otherwise you'll be testing the old version.

4. **Run tests repeatedly to detect flakiness:**

   For each test file, run it 10 times:

   ```
   PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10
   ```

   **IMPORTANT:** `PLAYWRIGHT_RETRIES=0` is required to disable automatic retries. Without it, CI environments (where `CI=true`) default to 2 retries, causing flaky tests to pass on retry and be incorrectly skipped as "not flaky."

   Notes:
   - If `$ARGUMENTS` is provided without the `e2e-tests/` prefix, add it
   - If `$ARGUMENTS` is provided without the `.spec.ts` suffix, add it
   - A test is considered **flaky** if it fails at least once out of 10 runs

5. **For each flaky test, investigate with debug logs:**

   Run the failing test with Playwright browser debugging enabled:

   ```
   DEBUG=pw:browser PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts
   ```

   Analyze the debug output to understand:
   - Timing issues (race conditions, elements not ready)
   - Animation/transition interference
   - Network timing variability
   - State leaking between tests
   - Snapshot comparison differences

6. **Fix the flaky test:**

   Common fixes following Playwright best practices:
   - Use `await expect(locator).toBeVisible()` before interacting with elements
   - Use `await page.waitForLoadState('networkidle')` for network-dependent tests
   - Use stable selectors (data-testid, role, text) instead of fragile CSS selectors
   - Add explicit waits for animations: `await page.waitForTimeout(300)` (use sparingly)
   - Use `await expect(locator).toHaveScreenshot()` options like `maxDiffPixelRatio` for visual tests
   - Ensure proper test isolation (clean state before/after tests)

   **IMPORTANT:** Do NOT change any application code. Assume the application code is correct. Only modify test files and snapshot baselines.

7. **Update snapshot baselines if needed:**

   If the flakiness is due to legitimate visual differences:

   ```
   PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --update-snapshots
   ```

8. **Verify the fix:**

   Re-run the test 10 times to confirm it's no longer flaky:

   ```
   PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10
   ```

   The test should pass all 10 runs consistently.

9. **Summarize results:**

   Report to the user:
   - Which tests were identified as flaky
   - What was causing the flakiness
   - What fixes were applied
   - Verification results (all 10 runs passing)
   - Any tests that could not be fixed and need further investigation
