---
name: forge-tests
description: Unit + integration + e2e discipline. Real DB over mocks, deterministic data, no shared state, descriptive failure messages, no skipped tests committed, no sleeps, parallel-safe. Contains test scaffolding, fixture factories, CI-runnable real-Postgres pattern. Use whenever you are writing tests or reviewing a test PR.
license: MIT
---

# forge-tests

You are writing tests that will run in CI for the lifetime of the project. Default agent-written tests mock the database, set state in a `beforeAll`, share fixtures across files, and use generic assertion messages. They pass for the wrong reasons and fail in inscrutable ways. This skill exists to stop that.

The mental model: a test is a contract. It states what should be true. When it fails, it should tell you exactly what was wrong without you having to read the test code.

## Quick reference (the things you must never ship)

1. `test.skip(...)` or `xit(...)` committed without an issue link.
2. `setTimeout(..., 100)` or `time.sleep(0.5)` inside a test (flake source).
3. `expect(true).toBe(true)` tautological assertion.
4. Mocked database in an integration test.
5. Shared module-level mutable variable across tests.
6. Test that depends on test execution order.
7. Test name `it("works")` or `it("test1")`.
8. `beforeAll` that mutates state which later tests depend on.
9. Reading `Date.now()` or `Math.random()` inside a test without injecting.
10. CI test suite that runs longer than 10 minutes total.

## Hard rules

### What to test

**1. Unit tests for pure functions and isolated logic. Integration tests for everything else.** Mocking a database to "unit test" a handler is a category error.

**2. Test behavior, not implementation.** A test that breaks every time you refactor is a test that tests the wrong thing.

**3. Test the boundary, not the internals.** For a service: HTTP in, HTTP out. Internals can change.

**4. Public API gets the most tests. Private helpers get few.** Helpers covered transitively by public-API tests.

### Real dependencies vs mocks

**5. Use a real database in integration tests, not a mock.** Spin one up in CI (Postgres in Docker takes 3 seconds to start).

```yaml
# .github/workflows/test.yml fragment
jobs:
  test:
    services:
      postgres:
        image: postgres:17-alpine
        env:
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
          POSTGRES_DB: test
        ports: ["5432:5432"]
        options: >-
          --health-cmd pg_isready
          --health-interval 5s
          --health-timeout 3s
          --health-retries 5
    steps:
      - uses: actions/checkout@SHA
      - run: npm ci
      - run: npm run migrate
        env: { DATABASE_URL: postgres://test:test@localhost:5432/test }
      - run: npm test
        env: { DATABASE_URL: postgres://test:test@localhost:5432/test }
```

**6. Mock external HTTP services with recorded fixtures, not hand-written stubs.** Tools like `msw`, `nock`, `vcr.py`, `httprecord`. Record real responses once, replay.

**7. Mock time, randomness, and external state at the seam.** Inject a clock parameter; do not call `Date.now()` directly in the function under test.

```ts
// BAD - hard to test
function isExpired(token: { expires_at: number }) {
  return token.expires_at < Date.now();
}

// GOOD - inject the clock
function isExpired(token: { expires_at: number }, now = Date.now()): boolean {
  return token.expires_at < now;
}

// in tests
expect(isExpired({ expires_at: 100 }, 200)).toBe(true);
```

**8. Network and disk are not mocked at unit level - they are not present.** Unit tests do not touch the network. Integration tests do, with real (or recorded) responses.

### Isolation

**9. Every test runs in isolation. No shared state.** No module-level mutable variables, no test-order dependencies.

**10. `beforeEach` to set up, `afterEach` to clean up. `beforeAll` only for read-only setup.** The most common test-suite bug is a `beforeAll` that mutates state.

```ts
// reference: per-test DB reset
beforeEach(async () => {
  await query("TRUNCATE orders, customers, audit_events RESTART IDENTITY CASCADE");
  resetIdempotency();
});
afterAll(async () => {
  await pool.end();
});
```

**11. Tests run in parallel without breaking.** If `--parallel` breaks the suite, you have isolation bugs.

**12. Each test gets a fresh fixture, not a reused one.** Factory function:

```ts
async function seedCustomer(overrides: Partial<{ email: string; name: string }> = {}) {
  const id = uuidv7();
  await query(
    "INSERT INTO customers (id, email, name) VALUES ($1, $2, $3)",
    [id, overrides.email ?? `t${id.slice(0, 8)}@example.com`, overrides.name ?? "Test"],
  );
  return id;
}
```

### Determinism

**13. No flaky tests. Ever.** A test that fails 1% of the time becomes the test that is muted and eventually deleted. Treat flakes as P0.

**14. No reliance on wall clock or random seeds.** Inject the clock, pin the seed.

**15. No reliance on time-of-day or geography.** A test that fails on weekends is broken.

**16. Sleeps in tests are forbidden.**

```ts
// BAD - heuristic, flaky
await sendEmail(user);
await sleep(100);
const email = await checkInbox();

// GOOD - poll with timeout, event-driven, or fake timers
async function eventually<T>(fn: () => Promise<T | null>, timeoutMs = 5000): Promise<T> {
  const start = Date.now();
  while (Date.now() - start < timeoutMs) {
    const result = await fn();
    if (result) return result;
    await new Promise((r) => setTimeout(r, 50));
  }
  throw new Error("eventually timed out");
}

const email = await eventually(() => checkInbox());
```

### Assertions

**17. One concept per test.** Multiple assertions fine, but they assert the same concept.

**18. Assertion messages name what was expected, not just what was wrong.** Most modern frameworks generate this automatically; check that yours does.

**19. Snapshot tests for stable structured output only.** Never for UI without strong reviewer culture - snapshots get rubber-stamped.

**20. No `expect(true).toBe(true)` placeholders.** A test without a real assertion is worse than no test.

### Names

**21. Test names are sentences. They describe behavior.**

```ts
// BAD
it("user creation", () => { ... });

// GOOD
it("creates a user with a hashed password", () => { ... });
```

**22. `describe` blocks group by surface or behavior.**

```ts
describe("POST /v1/orders", () => {
  it("creates an order and returns 201 with Location header", async () => { ... });
  it("rejects malformed JSON with 400 invalid_json", async () => { ... });
  it("returns 422 customer_not_found for unknown customer", async () => { ... });
});
```

### Performance

**23. Unit tests run in milliseconds. Integration tests in seconds. e2e in minutes.** If unit tests take a minute, they are not unit tests.

**24. CI test runtime budget is firm.** Above 10 minutes total and developers stop running tests locally.

**25. Parallelize at the runner level.** Vitest, Jest, pytest, Go's `testing` all support it.

### Coverage

**26. Coverage targets are a smell. Behavior is the metric.** 95% coverage with 0 integration tests is worse than 60% with real behavior tests.

**27. Read the coverage report as a map of "code I have not thought about."**

### Lifecycle

**28. `.skip` and `xit` are committed only with an issue link and owner.** Otherwise they rot.

**29. Quarantine flakes by moving them out of the main suite, not by deleting silently.**

## Common AI-output patterns to reject

| Pattern | Why wrong | Fix |
| --- | --- | --- |
| `vi.mock("../db")` for integration test | Tests the mock, not the SQL | Real DB, TRUNCATE per test |
| `expect(true).toBe(true)` | Tautology | Real assertion on real output |
| `sleep(500)` then check | Race / flake | `eventually(...)` with timeout |
| `Date.now()` in code under test | Can't pin time | Inject `now` param |
| `it("test")` or `it("works")` | Useless on failure | Describe the behavior |
| Module-level `let counter = 0` | Cross-test state | Fixture factory per test |
| `beforeAll` writes data | Order coupling | `beforeEach` per-test reset |
| `jest.useFakeTimers()` inside an integration test | Mixes layers | Fake timers for unit only |
| One test asserting 10 unrelated things | Hard to debug failure | Split: one concept per test |
| 100% coverage with no integration tests | Lies about safety | Behavioral tests beat coverage |

## Worked example: integration test against real Postgres

```ts
// test/orders.test.ts
import { afterAll, beforeEach, describe, expect, it } from "vitest";
import { uuidv7 } from "uuidv7";
import { app } from "../src/server.js";  // built without listening
import { pool, query } from "../src/db.js";
import { resetIdempotency } from "../src/idempotency.js";

async function seedCustomer(): Promise<string> {
  const id = uuidv7();
  await query(
    "INSERT INTO customers (id, email, name) VALUES ($1, $2, $3)",
    [id, `t${id.slice(0, 8)}@example.com`, "Test Customer"],
  );
  return id;
}

beforeEach(async () => {
  await query("TRUNCATE orders, customers, audit_events RESTART IDENTITY CASCADE");
  resetIdempotency();
});

afterAll(async () => {
  await pool.end();
});

describe("POST /v1/orders", () => {
  it("creates an order and returns 201 with a Location header", async () => {
    const customer_id = await seedCustomer();
    const res = await app.request("/v1/orders", {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify({
        customer_id,
        currency: "USD",
        items: [{ sku: "SKU-1", quantity: 2, unit_price_cents: 1500 }],
      }),
    });
    expect(res.status).toBe(201);
    expect(res.headers.get("location")).toMatch(/^\/v1\/orders\//);
    const body = await res.json();
    expect(body.data.total_cents).toBe(3000);
    expect(body.data.status).toBe("pending");
  });

  it("rejects malformed JSON with 400 invalid_json", async () => {
    const res = await app.request("/v1/orders", {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: "{not valid",
    });
    expect(res.status).toBe(400);
    const body = await res.json();
    expect(body.error.code).toBe("invalid_json");
  });

  it("replays cached response on Idempotency-Key collision", async () => {
    const customer_id = await seedCustomer();
    const headers = {
      "content-type": "application/json",
      "idempotency-key": "test-key-001",
    };
    const body = JSON.stringify({
      customer_id,
      currency: "USD",
      items: [{ sku: "SKU-1", quantity: 1, unit_price_cents: 999 }],
    });

    const first = await app.request("/v1/orders", { method: "POST", headers, body });
    const firstBody = await first.json();

    const second = await app.request("/v1/orders", { method: "POST", headers, body });
    expect(second.status).toBe(201);
    expect(second.headers.get("x-idempotent-replay")).toBe("true");
    const secondBody = await second.json();
    expect(secondBody.data.id).toBe(firstBody.data.id);
  });
});
```

What this shows: real DB, TRUNCATE per test (rule 10), factory function for fixtures (rule 12), descriptive test names (rule 21), one concept per test (rule 17), realistic assertion patterns.

## Workflow

1. **Write the test name first.** "It does X when Y" - if you cannot phrase it, the concept is unclear.
2. **Write the assertion before the setup.** Knowing what you check tells you what setup matters.
3. **Use a fresh fixture per test.** Factory functions, not shared objects.
4. **Run the test, watch it fail for the expected reason.** If it passes without your code, the test is wrong.
5. **Write the code to make it pass.**
6. **Add edge cases.** Null, empty, max size, concurrent, permission-denied.

## Verification

```bash
bash skills/testing/forge-tests/verify/check_tests.sh path/to/test/file
```

Flags: `.skip` / `xit` / `.only` without an issue link, sleeps in tests, tautological assertions.

## When to skip this skill

- Throwaway scripts.
- Generated code.
- POCs explicitly labeled as not-for-production.

## Related skills

- [`forge-api-design`](../../backend/forge-api-design/SKILL.md) - canonical error shape, status codes (asserted on in tests).
- [`forge-validation`](../../backend/forge-validation/SKILL.md) - the validation errors tests assert against.
- [`forge-error-handling`](../../backend/forge-error-handling/SKILL.md) - the catch sites tests cover.