---
name: regenerate-flow-discovery
description: Rebuild a connector's generated schemas directory and `test.flow.yaml` bindings via `flowctl raw discover`, then refresh pytest snapshots. Destructive — deletes the generated dir and rewrites the test config. Use after adding or removing a stream.
argument-hint: "[connector-name]"
allowed-tools: Bash Read Write Edit Glob
---

Rebuild `source-$1`'s generated schemas and `test.flow.yaml` bindings from scratch, then refresh the pytest snapshots so they reflect the new stream set.

## Laws

These apply in every phase. Re-read them before each phase boundary.

1. Activate the connector's Python venv before invoking `flowctl`. A fresh Bash invocation will not have it active; the connector subprocess will fail with `ModuleNotFoundError: No module named '<package>'`. Use `source "$(poetry env info --path)/bin/activate"` from the connector dir.
2. When emptying `bindings:` in `test.flow.yaml`, use `Write` (full file rewrite), **not** `Edit`. A partial Edit can leave orphaned `- resource:` entries dangling under an empty list, which produces invalid YAML that's hard to spot.
3. After discover, audit `git diff --stat`. If the diff includes schema changes unrelated to the new stream (uniform per-file deltas across many existing schemas), recommend the user split the schema sweep into its own commit so the new-stream PR diff stays scoped.
4. **`flowctl raw discover` is always free** — exercises only the connector's spec/discovery path, doesn't burn provider API budget. No consent needed to run it.
5. **`flowctl preview` and the pytest snapshot tests that drive it do hit the provider's API.** If the upstream `add-stream` skill confirmed every required endpoint allows > 20 requests/hour, run them freely. Otherwise: ask before each run, and use `disable: true` on unrelated bindings in `test.flow.yaml` to cut the request count to just what's needed.
6. Pipe long test output to a file and Read it (per top-level `CLAUDE.md`). Don't `tail` / `head` pytest output — important lines often live in the middle.
7. Open the session with a TODO list. One task per phase below.

## Phase 1 — Wipe

```bash
rm -rf <connector>/acmeCo/
```

## Phase 2 — Reset Bindings

Open `<connector>/test.flow.yaml`. Replace the entire file with a minimal shape:

```yaml
captures:
  acmeCo/source-<connector>:
    endpoint:
      local:
        command:
          - python
          - -m
          - source_<connector_pkg>
        config: config.yaml
    bindings: []
```

Drop the `import:` line for the generated directory — it points to a file we just deleted. Use `Write` (full file rewrite), not `Edit`, to avoid orphaned binding entries.

Preserve whatever endpoint configuration the original `test.flow.yaml` had — read it first if you're unsure of the shape.

## Phase 3 — Discover

Run from the connector dir:

```bash
poetry run flowctl raw discover --source test.flow.yaml
```

Expect output like `Wrote N specifications under file://…/test.flow.yaml.`

`test.flow.yaml` will be rewritten with one binding per discovered stream. The generated schema directory will be repopulated. Schemas will reflect today's CDK output, not whatever was committed previously.

## Phase 4 — Diff Audit

```bash
git status --short
git diff --stat
```

What you expect: the new stream's schema file appears (untracked), `test.flow.yaml` adds one binding, the generated `flow.yaml` adds one collection entry.

What you might _also_ see: uniform per-file deltas across many existing schema files. That's a CDK schema sweep — the committed schemas predate some CDK feature (a new reduce block, a new `_meta` field, etc.) and today's discover normalizes them. Recommend the user split the sweep into its own commit _before_ the new-stream commit, so the PR diff for the actual stream addition stays scoped.

If the new schema file is missing despite a successful discover, check that `flow.yaml` references it correctly and that nothing in the working tree was cleaned up (e.g. via `git clean -f`) between Phase 4 and now.

## Phase 5 — Snapshot Refresh

```bash
poetry run pytest tests/ > /tmp/discover-pytest-1.log 2>&1
```

Read `/tmp/discover-pytest-1.log` to see what failed. Expect at least the discover snapshot test (`test_discover`, or whatever the connector calls it) to fail because of the new stream slotted into a new alphabetical position.

Then refresh:

```bash
poetry run pytest tests/ --insta=update > /tmp/discover-pytest-update.log 2>&1
```

Re-run `poetry run pytest` to confirm all tests now pass and the snapshot file(s) were updated.

Verify post-refresh:

- The new stream appears in the discover snapshot.
- The capture snapshot may or may not contain the new stream. Snapshot tests that drive `flowctl preview` are typically capped at a fixed document count (often 50); a low-traffic stream can lose the race for those 50 slots. That matches existing comments in many connector tests for similarly-quiet streams (e.g. event streams) — don't panic, just note it.

## Output

Report to the user:

1. What changed in `git diff --stat`.
2. Whether the new stream's schema file is present.
3. Whether the discover and capture snapshots now contain the new stream.
4. Any CDK schema sweep observed, with a recommendation to split it into a separate commit.
