---
name: sap-gui-skill-scaffold
description: |
  Author a new transaction-specific SAP skill from multiple natural-language
  scenarios. Runs /sap-gui-probe for each scenario, then merges the resulting
  probe folders into one coherent skill folder: SKILL.md with mode dispatch,
  one references/sap_<name>_<mode>.vbs per probe, parameter tokens derived
  by cross-probe diff (values that vary across probes become %%TOKEN%%;
  values that stay constant bake in), popup-branch guards at every step where
  any probe observed a wnd[1] popup. Output is a ready-to-test draft.
  Prerequisites: active SAP GUI session (use /sap-login first).
argument-hint: "<new-skill-name> --goal \"<one-line goal>\"   |   <name> --scenario \"...\" --scenario \"...\"   |   <name> --manifest <path>"
---

# SAP GUI Skill Scaffold

You author a new transaction-specific SAP skill from a small set of
natural-language scenarios. Each scenario is probed via /sap-gui-probe; the
resulting probe folders are merged by cross-probe diff to identify parameters
(values that vary) vs. constants (values that stay the same). The output is
a scaffolded skill folder under `{work_dir}\skill_scaffolds\<name>_<ts>\`.

Task: $ARGUMENTS

---

## Shared Resources

| File | Purpose |
|---|---|
| `<SAP_DEV_CORE_SHARED_DIR>/rules/skill_operating_rules.md` | Mandatory operating rules |
| `<SAP_DEV_CORE_SHARED_DIR>/rules/settings_lookup.md` | Settings model — merge per-key on `.value` (env var → `settings.local.json` → `userconfig.json` → `settings.json`); non-per-connection writes go to `userconfig.json` |
| `<SAP_DEV_CORE_SHARED_DIR>/rules/language_independence_rules.md` | GUI-scripting language independence — generated VBS templates MUST follow these rules (identify by component ID + DDIC field name, status-bar checks via `MessageType` codes, VKey instead of menu-text, no branching on `.Text`/`.Tooltip`/window titles) |
| `<SAP_DEV_CORE_SHARED_DIR>/rules/abap_code_quality_rules.md` | ABAP code-quality rules — when scaffolding skills that emit or paste ABAP source (deploy / codegen / fix templates), the generated SKILL.md must reference this file and the templated VBS must not embed literal MESSAGE strings or other quality anti-patterns |
| `<SKILL_DIR>/references/scenario_catalog.tsv` | Known-stuck-point catalog consulted by **Step 0.9** (`--goal` mode). Tab-separated; keyed by `txn` + `object_type`; each row maps a trap to a `scenario_type` + `applies` gate + `probe_hint`. Read by Claude (Read tool), not VBS. |
| `<SKILL_DIR>/references/run_mode_test.ps1` | **Step 5.5** test-runner — substitutes a generated mode VBS's tokens, runs it via 32-bit cscript, captures end-state (MessageType / popup / screen) into `result.json`. |
| `<SKILL_DIR>/references/verify_create_object.ps1` | **Step 5.5** create-mode verifier — routes by object type (DDIC → `sap_se11_post_activate_verify.ps1`; PROGRAM/FM/CLASS → RFC). Returns ACTIVE / INACTIVE / MISSING. Run under 32-bit PowerShell. |

---

## Step 0 — Resolve work directory + scaffold folder

**Resolve `work_dir` via the env-aware helper** — do NOT take `work_dir` from a direct `settings.json` read (that ignores the `SAPDEV_AI_WORK_DIR` env var and `userconfig.json`). Use the `WORK_DIR=` value printed by:

```bash
powershell -NoProfile -ExecutionPolicy Bypass -Command ". '<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_settings_lib.ps1'; . '<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_connection_lib.ps1'; Write-Output ('WORK_DIR=' + (Get-SapWorkDir))"
```

The settings note below still applies to the OTHER keys.

**Settings reads/writes follow `shared/rules/settings_lookup.md`** — merge per-key on the `.value` field (env var → `settings.local.json` → `userconfig.json` → `settings.json`); non-per-connection writes go to `userconfig.json`. Read sap-dev-core's `settings.json` (2 levels up from `<SKILL_DIR>`). Read
`work_dir`. Default: `C:\sap_dev_work`.

Derive:
- `{WORK_TEMP}`       = `{work_dir}\temp`
- `{TS}`              = current timestamp `yyyyMMdd-HHmmss`
- Parse new skill name + scenarios from `$ARGUMENTS` (see Step 1).
- `{SCAFFOLD_FOLDER}` = `{work_dir}\skill_scaffolds\<new-skill-name>_<TS>`

```powershell
New-Item -Path '{WORK_TEMP}'       -ItemType Directory -Force | Out-Null
New-Item -Path '{SCAFFOLD_FOLDER}' -ItemType Directory -Force | Out-Null
```

Use `New-Item -Force` rather than `cmd /c … mkdir` — Windows' built-in
`mkdir` silently no-ops when the parent path is missing under some
cmd.exe extension states, leaving `{SCAFFOLD_FOLDER}` uncreated and
Step 0.5's `-StateFile` write below raising `DirectoryNotFoundException`.
`New-Item -Force` is reliable cross-shell and creates intermediate
directories.

---

## Step 0.5 — Start logging

```bash
powershell -ExecutionPolicy Bypass -File "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_log_helper.ps1" -Action start -StateFile "{SCAFFOLD_FOLDER}\sap_gui_skill_scaffold_run.json" -Skill sap-gui-skill-scaffold -ParamsJson "{\"new_skill\":\"<name>\",\"scenario_count\":\"<N>\"}"
```

The probe runs invoked in Step 2 inherit this run as their parent via the
`SAPDEV_PARENT_RUN_ID` env var, so `/sap-log-analyze` can reconstruct the
scaffold → probe call tree.

---

## Step 0.7 — Pre-flight: GUI session + active-session pin

First, confirm at least one SAP GUI session is attached:

```bash
C:/Windows/SysWOW64/cscript.exe //NoLogo "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_check_gui_login_status.vbs"
```

If status is not `LOGGED_IN`, stop and tell the user to run `/sap-login` first.
Log end Status=FAILED ErrorClass=NO_SESSION.

Second, resolve the **active-session pin** via the connection lib:

```powershell
. '<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_connection_lib.ps1'
$session = Get-SapCurrentSessionPath           -WorkTemp '{WORK_TEMP}'
$profile = Get-SapCurrentConnectionProfile     -WorkTemp '{WORK_TEMP}'
```

- `$session` is the SAP GUI session path for the AI-session's pinned connection (sole-conn fallback applies). Empty when no pin AND multi-conn.
- `$profile` is the full connection profile (or `$null` when nothing is pinned). It carries version fields used below.

**When `$session` is empty**, try auto-pin before refusing. If the broker
registry has exactly one connection block with a non-empty `connection_id`
(other blocks are unregistered shells the broker cannot acquire against
anyway), pin to it automatically; if more than one is registered, refuse:

```powershell
if ([string]::IsNullOrWhiteSpace($session)) {
    # Inspect the registry to count registered (connection_id-bearing) blocks.
    $registry = Get-Content '{work_dir}\runtime\session_registry.json' -Raw -Encoding UTF8 | ConvertFrom-Json
    $registered = @($registry.connections | Where-Object { "$($_.connection_id)" -ne '' })
    if ($registered.Count -eq 1) {
        $aid = Get-SapAiSessionId
        & '<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_session_broker.ps1' `
            -Action pin -AiSessionId $aid -ConnectionId $registered[0].connection_id `
            -PinReason 'scaffold_auto' -WorkTemp '{WORK_TEMP}' | Out-Host
        $session = Get-SapCurrentSessionPath        -WorkTemp '{WORK_TEMP}'
        $profile = Get-SapCurrentConnectionProfile  -WorkTemp '{WORK_TEMP}'
    } else {
        Write-Error "multiple SAP GUI connections detected (registered: $($registered.Count)) and no active session pinned; run /sap-login to pin one. Candidates: $(($registered | ForEach-Object { $_.connection_id }) -join ', ')"
        # Log end Status=FAILED ErrorClass=NO_PIN.
        exit 1
    }
}
```

This handles the common "two connections in saplogon, only one registered
with the broker" case without forcing a full `/sap-login` cycle. When
auto-pin fires, the resulting pin's `pin_reason` is `scaffold_auto` so
the operator can audit it via `broker list` / inspecting the registry.
The refusal message lists candidate connection_ids so the operator can
pin manually via `broker -Action pin -ConnectionId <id>` if needed.

*Phase 4.2 note:* prior versions read `{WORK_TEMP}\sap_active_session.json` for both session_path AND version info. That file is gone. Session path resolution + version info both go through the lib helpers above. Cross-AI-session persistence lives in `connections.json` via `default_target_id`.

The resolved path is `{PINNED_SESSION}`. Its parent connection (everything
up to the final `/ses[N]`) is `{PINNED_CONNECTION}`. `{PINNED_SESSION}`
is the default session for serial-mode probes (Step 2-Serial). The
parallel path (Step 2-Parallel) doesn't use `{PINNED_SESSION}` directly —
the broker allocates fresh sessions there; `{PINNED_CONNECTION}` is kept
in scope for diagnostic logging only.

Also copy version fields from `$profile` into the scaffolder's run state
and into the generated SKILL.md's "Probed against" header:
- `gui_version_raw`, `gui_major`     ← `$profile.gui_version_raw`, `$profile.gui_major`
- `server_release_marker`, `server_release_raw`
- `system_name`, `client`

**System-id consistency check (mandatory).** The pinned connection's
`system_id` MUST match the system the operator is actually working in
on the active GUI window. Mismatch is silent today but routinely
misroutes probes to the wrong system (observed 2026-05: pin landed on
the registered S4D block while operator's active SAP GUI window was
S4H — probes drove S4H but the scaffolder logged everything as S4D,
and the configured `sap_dev_transport_request` for S4D was rejected by
S4H). Implement as:

```powershell
# Re-read the active-GUI status line emitted by Step 0.7's first command.
$activeSystem = (& "C:\Windows\SysWOW64\cscript.exe" //NoLogo "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_check_gui_login_status.vbs"
    | Select-String -Pattern '^SYSTEM:\s*(\S+)') -replace '^.*SYSTEM:\s*',''
$pinnedSystem = "$($profile.system_name)"   # could also use $profile.system_id
if ($activeSystem -and $pinnedSystem -and ($activeSystem -ne $pinnedSystem)) {
    Write-Error "GUI session is on '$activeSystem' but the pinned connection is '$pinnedSystem'. Run /sap-login to re-pin, or unpin and rerun the scaffolder."
    # Log end Status=FAILED ErrorClass=PIN_SYSTEM_MISMATCH
    exit 1
}
```

This guard is what makes Bug 6 (per-connection TR resolution) actually
correct downstream — if the pin and the GUI agree, then
`sap_dev_transport_request` resolved from the per-profile
`dev_defaults` in `connections.json` is automatically the TR for the
right system. Without this guard, the per-connection mapping is moot.

---

## Step 0.9 — Imagine scenarios (only when `--goal` is supplied)

**When this runs:** only if `$ARGUMENTS` contains `--goal "<one-line goal>"`
AND contains no `--scenario` / `--manifest`. If `--scenario`/`--manifest` are
present, skip this step (the user enumerated scenarios explicitly). If BOTH
`--goal` and `--scenario` appear, prefer the explicit `--scenario` list and note
that `--goal` was ignored. This step builds the same ordered
`{scenario_text, scenario_type}` list Step 1 would otherwise parse from flags,
then hands off to Step 1.

This step takes **no human checkpoint** before probing — by product decision the
scaffolder may brainstorm and probe autonomously once `--goal` is given. You
still ECHO the imagined plan (Step 1's echo block) so the user can interrupt,
but you do not pause for confirmation. (The human gate is the *final acceptance
test* the user runs after Step 5.5, not here.)

**0.9.1 — Parse `--goal`.** Capture the quoted goal. Strip it from the working
`$ARGUMENTS` so Step 1 doesn't reparse it. Keep `--tcd`, `--parallel`,
`--parallel-cap`, `--force-overwrite`, `--no-test`, `--test-budget-min` — they
still apply.

**0.9.2 — Extract transaction + object type.**
- *Transaction:* prefer an explicit `--tcd <TXN>`; else scan the goal for a
  transaction token (e.g. `SE11`, `SE38`); else derive from the new-skill name
  (`sap-se11-domain` → `SE11`). If none can be found, ask the user once for the
  transaction (the only allowed interaction in `--goal` mode).
- *Object type:* scan the goal AND the skill-name suffix with the SAME object
  discriminator table as Step 1 Pass 2 (`-domain`, `-table`, `-structure`,
  `-dataelement`, `-tabletype`, `-program`, `-fm`, `-class`, …). Normalise to an
  UPPERCASE catalog key (`DOMAIN`, `TABLE`, `PROGRAM`, `FM`, `CLASS`, …). If no
  object noun appears, use `*`.

**0.9.3 — Consult the scenario catalog.** Read
`<SKILL_DIR>\references\scenario_catalog.tsv` (skip `#`/blank lines; split each
row on TAB into the 8 columns documented in its header). Select rows where `txn`
equals the extracted transaction AND (`object_type` equals the extracted object
type OR is `*`). **Honour the `applies` gate** on each matched row:
- `always` — include unconditionally.
- `object_type in (A,B,C)` — include only if the extracted object type is in the
  list. (This is why the sub-type-popup row fires for STRUCTURE/DATAELEMENT/
  TABLETYPE but not DOMAIN.)
- `requires_existing` — include only when the goal operates on an object that
  must already exist (delete / display / change). For a pure "create" goal, drop
  it.
- `master_lang_differs` — include only if you can plausibly reach that state;
  otherwise drop with a one-line note.

A row whose `value_hint` begins with `(` is a NON-PROBE note (e.g. the
`post-activate-verify` row) — do not emit a probe for it; it reminds you what the
generated skill must assert. Skip it for scenario generation.

**0.9.4 — Brainstorm the scenario set, in this fixed order:**
1. **Exactly one happy-path scenario** (`scenario_type=success`) — always first
   (it is the mandatory warmup probe in Step 2). Use the catalog's `*-happy` row
   for this txn+object, or synthesise one from the goal.
2. **One scenario per remaining matched catalog row**, mapped to that row's
   `scenario_type`. These regression-anchored failure modes come BEFORE any
   AI-invented ones because they encode verified traps.
3. **AI gap-fill (optional):** only if after 1–2 you have fewer than 3 scenarios
   AND you can name a concrete, reachable additional failure mode from the
   taxonomy. Do not invent speculative traps the probe can't reach in ≤30 steps.

**0.9.5 — Render each scenario** as the natural-language string `/sap-gui-probe`
parses (`<TXN>: <flow> then exit`):
- Start with `<TXN>: ` so the probe's TXN heuristic locks on.
- Use the catalog `probe_hint` as the flow spine.
- Substitute a concrete test value — from the goal if given, else the row's
  `value_hint`, else a minted sandbox name (`Z<tag>SCAF<NN>`, e.g.
  `ZDOMSCAF001`). `not_found` scenarios use a deliberately nonexistent name
  (`ZNONEXIST_SCAF`).
- Append ` then exit` so the probe returns to Easy Access (bounds the step count).
- The scenario's `scenario_type` is the catalog row's value.

**0.9.6 — Bound the count** at **5 scenarios total (1 success + up to 4
failure-mode)**. If the catalog yields more than 4 failure rows, keep the 4
highest-value (rank `popup_recovery` / `validation_error` from a verified
memory/skill source above generic `not_found`). Always keep ≥ 2 (Step 1's
minimum). If zero rows matched AND you cannot gap-fill even one failure mode,
fall back to the happy path + a generic `not_found` of the same flow. If even 2
is impossible (read-only txn, no not-found path), STOP with: *"--goal produced
only one viable scenario; run /sap-gui-probe directly with: `<the one
scenario>`"* and Log end Status=SKIPPED ErrorClass=INSUFFICIENT_SCENARIOS.

**0.9.7 — Hand off to Step 1.** Materialise the imagined list as the ordered
`{scenario_text, scenario_type}` structure Step 1 builds after flag-parsing, set
`scenarios_imagined = true`, and proceed into Step 1 at the mode-name-derivation
sub-step.

---

## Step 1 — Parse arguments

**Input source.** Scenarios reach Step 1 two ways: (a) the user typed
`--scenario`/`--manifest` (parse per the rules below); or (b) Step 0.9 already
imagined them from `--goal` and set `scenarios_imagined = true` — then the
ordered `{scenario_text, scenario_type}` list already exists, so **skip parse
rules 2, 3, 3b** and go straight to "After parsing" (mode-name derivation).
Rules 1 (skill-name validation), 4 (`--force-overwrite`), 5 (`--tcd`), 6/7
(`--parallel*`), and 8/9 (`--no-test` / `--test-budget-min`) always apply
regardless of source.

`$ARGUMENTS` shape:

```
<new-skill-name> --scenario "<s1>" [--scenario-type <t1>] --scenario "<s2>" [--scenario-type <t2>] ...
<new-skill-name> --manifest <path-to-manifest.txt>
<new-skill-name> --scenario "<s1>" ... --force-overwrite
```

Parse rules:

1. **First positional arg** = the new skill name. Must match `^sap-[a-z0-9-]+$`
   (CLAUDE.md naming convention). Reject otherwise.
2. **`--manifest <path>`**: UTF-8 file, one scenario per line, blank lines and
   `#`-prefixed lines ignored. Read and treat as if each was `--scenario "..."`.
   Manifest line may carry a leading `<type>:` prefix (e.g.
   `not_found: MM03 display ZNONEXISTENT`) — the prefix sets that scenario's
   `scenario_type` and is stripped from the scenario text. Recognised
   prefixes: `success:`, `not_found:`, `auth_error:`, `popup_recovery:`,
   `validation_error:`. Lines without a prefix default to `success`.
3. **`--scenario "..."`** (repeatable): collect into an ordered list.
3b. **`--scenario-type <type>`** (optional, repeatable, paired): immediately
    after a `--scenario` arg, set that scenario's type. One of
    `success` (default) / `not_found` / `auth_error` / `popup_recovery` /
    `validation_error`. Pairs by ORDER — the Nth `--scenario-type` flag
    binds to the Nth `--scenario`. Unbalanced pairing (more `--scenario-type`
    than `--scenario`, or interleaved out of order) → refuse with a clear
    message before running any probe.
4. **`--force-overwrite`** (optional): if a skill folder with this name
   already exists inside any installed plugin's `skills/` dir, snapshot the
   old folder under `{SCAFFOLD_FOLDER}\.scaffold-overlay\` before generating.
   Without this flag, refuse with a clear message.
5. **Optional `--tcd <TXN>`**: informational tag for the SKILL.md header. If
   omitted, derive from the first scenario via the same TXN-extraction
   heuristic /sap-gui-probe uses.
6. **Optional `--parallel`**: run all scenario probes concurrently, one
   SAP GUI session per probe sub-agent. See Step 2-Parallel below. Without
   this flag, Step 2 runs serially against the pinned session (today's
   behaviour).
7. **Optional `--parallel-cap N`** (default 6): max concurrent probes.
   Capped at 6 because SAP's default `rdisp/max_alt_modes` is 6 sessions
   per connection. If `--parallel` is set and scenario count > cap, run
   in batches of `cap`.
8. **Optional `--no-test`**: skip the autonomous test/fix loop (Step 5.5). The
   draft is emitted and self-reviewed but never exercised against SAP. Use when
   you only want the generated artifacts (e.g. on a system where create/delete
   test fixtures are undesirable).
9. **Optional `--test-budget-min N`** (default 20): global wall-clock budget for
   the Step 5.5 test/fix loop. When exceeded, the loop stops, runs fixture
   cleanup, and reports `Status=ABORTED_BUDGET` with the partial results.

After parsing:
- If scenario count < 2, refuse with: *"scaffolding from one probe is just
  synthesized.vbs -- use that directly via /sap-gui-probe"*. Log end
  Status=SKIPPED.
- Derive a mode name per scenario in two passes:

  **Pass 1 — verb**: scan the scenario for an action verb. First match wins.

  | Keyword regex (case-insensitive) | Verb label |
  |---|---|
  | `\bnot[ _-]?found\b\|missing\|nonexistent` | `not-found` |
  | `\bdisplay\b\|show\|view\b` | `display` |
  | `\bcreate\b\|new\b\|add\b` | `create` |
  | `\bchange\b\|update\|modify\|edit\b` | `change` |
  | `\bdelete\b\|drop\|remove\b` | `delete` |
  | `\bcheck\b\|syntax\b` | `check` |
  | `\bactivate\b` | `activate` |
  | `\bwhere[ _-]?used\b\|usages?\b` | `where-used` |
  | `\bcopy\b\|clone\b` | `copy` |
  | `\brename\b` | `rename` |

  **Pass 2 — object discriminator**: when the verb matches but a SAP repo
  object noun is also present, append it as a suffix. This keeps
  multi-object SE11 / SE38 / SE24 / ... skills from collapsing every
  scenario into one mode. First match wins; case-insensitive.

  | Keyword regex | Suffix |
  |---|---|
  | `\btable\b` | `-table` |
  | `\bview\b` | `-view` |
  | `\bdataelement\b\|\bdata[ _-]?element\b` | `-dataelement` |
  | `\bdomain\b` | `-domain` |
  | `\bstructure\b` | `-structure` |
  | `\btabletype\b\|\btable[ _-]?type\b` | `-tabletype` |
  | `\btypegroup\b\|\btype[ _-]?group\b` | `-typegroup` |
  | `\bsearchhelp\b\|\bsearch[ _-]?help\b` | `-searchhelp` |
  | `\blockobject\b\|\block[ _-]?object\b` | `-lockobject` |
  | `\bprogram\b\|\breport\b\|\bpgm\b` | `-program` |
  | `\bfunction[ _-]?module\b\|\bfm\b` | `-fm` |
  | `\bfunction[ _-]?group\b\|\bfugr\b` | `-fugr` |
  | `\bclass\b` | `-class` |
  | `\binterface\b` | `-interface` |
  | `\bmessage[ _-]?class\b` | `-messageclass` |
  | `\bpackage\b` | `-package` |

  If neither pass matches a verb, fall back to `mode_NN` (NN = scenario
  index, 1-based). Two scenarios producing the same final mode label
  (e.g. both `create-domain`) get de-duplicated — both contribute their
  actions to a single merged mode VBS. This is fine when the actual
  flows differ only in payload values (the merge surfaces those as
  parameters); Step 5 self-review only flags a collision warning when
  the merge produced ZERO parameters (then the collapse was a real
  accident, not intentional value-variance).

  **Pass 3 — scenario_type suffix**: when this scenario's
  `scenario_type` ≠ `success`, append `-<scenario_type>` (with
  underscores preserved) to the mode label. So a `display-material`
  scenario with `--scenario-type not_found` becomes mode
  `display-material-not_found`. This keeps happy-path and
  failure-mode probes in SEPARATE generated VBS files — callers of
  the generated skill pick which behaviour they want via the mode
  argument, and the SKILL.md dispatch table documents each.

Echo the parsed plan to the user before Step 2. When Step 0.9 imagined the
scenarios from `--goal`, prefix the block with two lines — the verbatim goal and
the catalog match (e.g. `Imagined 4 scenario(s) (catalog SE11/DOMAIN, 3 trap
row(s) matched)`):

> Scaffolding **<new-skill-name>** from N scenario(s):
> 1. mode=`display` (type=success) -- "<scenario 1 verbatim>"
> 2. mode=`display-not_found` (type=not_found) -- "<scenario 2 verbatim>"
> 3. mode=`delete` (type=success) -- "<scenario 3 verbatim>"
> ...
> Output folder: `<SCAFFOLD_FOLDER>`

---

## Step 2 — Run /sap-gui-probe for each scenario

The execution path branches on `--parallel`:

### 2-Serial — default path (no `--parallel`)

Use the Skill tool to invoke `/sap-gui-probe` with each scenario, in order.
Always append `--auto` to the scenario string -- the scaffolder is
non-interactive; the human authorised this whole run by typing the scenarios.
**If the scenario's `scenario_type` ≠ `success`, also append
`--scenario-type <type>`** so the probe relaxes its abort conditions
appropriately (see /sap-gui-probe SKILL.md Step 2.8 tolerance table).

After each probe, capture the resulting run folder path from the probe's
final report and append to your in-memory probe list:

```
[
  { "scenario": "<s1>", "scenario_type": "success",   "mode": "display",            "folder": "{work_dir}\\probes\\SE37_20260512-200000" },
  { "scenario": "<s2>", "scenario_type": "not_found", "mode": "display-not_found",  "folder": "{work_dir}\\probes\\SE37_20260512-200430" },
  ...
]
```

If any probe ends with status FAILED or ABANDONED, **stop the scaffold here**.
Log end Status=FAILED ErrorClass=PROBE_FAILED ErrorMsg="<scenario index>".
The failed probe's run folder is still on disk for the user to inspect; the
partial probes that succeeded are also kept. Do NOT proceed to merge --
a partial scaffold is worse than no scaffold.

### 2-Parallel — `--parallel` path

Active when `--parallel` is set on the invocation.

Allocation goes through the **SAP GUI Session Broker** —
`shared/scripts/sap_session_broker.ps1`, contract documented in
`shared/rules/sap_session_broker.md`. The broker owns session
discovery / spawn-on-demand / lifecycle / cleanup so the scaffolder
doesn't have to.

**Warmup-first sequencing (mandatory).** The FIRST scenario ALWAYS runs
serially (Steps 2-Serial below) BEFORE any parallel fanout begins. The
scaffolder treats scenario 1 as the warmup — by convention it creates
any prerequisites later scenarios depend on (a fresh TR, a new package,
a base function group, etc.). Running it solo first eliminates the
race condition where parallel followers hit SAP before the warmup has
landed its writes (observed in 2026-05 test run: 2 of 3 followers
failed with "package ZCMSKILLSC3 does not exist" because they raced
ahead of scenario 1's package-create step). Implementation:

1. Run scenario 1 via Step 2-Serial below — single probe, no broker
   acquire (use `{PINNED_SESSION}` directly).
2. On success, record the run folder and continue.
3. On FAILED / ABANDONED, abort the whole scaffold immediately.
4. THEN proceed to the broker-coordinated parallel fanout (Steps 2.0
   onward) for scenarios 2..N.

This applies regardless of whether scenario 1 has explicit prerequisite
syntax — the scaffolder cannot know which writes are prerequisites
without solving a semantics problem, so the conservative rule is "first
scenario always warms up." Operators who DON'T want a warmup (e.g. all
scenarios are read-only) can ignore the cost; one extra serial probe
adds maybe 30s and is correct.

**2.0 — Pre-flight: discover existing sessions, then gc any stale claims.**

```bash
powershell -ExecutionPolicy Bypass -File "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_session_broker.ps1" -Action gc -WorkTemp "{WORK_TEMP}"
powershell -ExecutionPolicy Bypass -File "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_session_broker.ps1" -Action discover -WorkTemp "{WORK_TEMP}"
```

`gc` first so any stale entries from a previous AI session (crashed sub-
agents, abandoned claims past TTL) get cleared before discover registers
the live sessions. Discover output:

```
DISCOVERED: <n> new (total free=<f> user_owned=<u>)
```

If `free < parallel_cap` we don't worry — the broker spawns on demand
inside the per-scenario `acquire` calls. If `free == 0` AND the SAP cap
is 6 already, the very first acquire will get `DENIED: ... cap reached`;
treat that as a hard abort.

**2.1 — Acquire one session per scenario in the batch.** Build descriptors
by acquiring up front (rather than letting each agent acquire its own —
the scaffolder is the orchestrator, so it holds every claim on the
agents' behalf):

```bash
# For each scenario i in this batch:
powershell -ExecutionPolicy Bypass -File "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_session_broker.ps1" `
    -Action      acquire `
    -TaskId      "scaffold_<runId>_scenario_<i>" `
    -OwnerSkill  "sap-gui-skill-scaffold" `
    -OwnerPid    0 `
    -TtlSeconds  1800 `
    -WorkTemp    "{WORK_TEMP}"
# (no -PinFile: broker auto-resolves the AI session's pinned connection via
# Get-SapAiSessionId + session_registry.json's ai_sessions map.)
```

**`-TtlSeconds 1800` is INTENTIONAL** for the scaffolder. The broker's default TTL is 600s (10 min), which is fine for typical Tier 3 skill runs but tight for parallel scaffolder batches — each sub-agent probe can take 7-9 min, and the scaffolder waits for the SLOWEST of 4 before releasing. The default 600s TTL routinely fires on batches that take longer than 10 min wall-clock, dropping entries before release can fire. The release call then returns `NOT_FOUND: task=... entry was here but dropped by reactive cleanup` instead of `RELEASED:` — idempotent but misleading. Bumping to 1800s (30 min) gives generous margin over the longest expected probe runtime.

**`-PinFile` is REQUIRED** — without a connection resolver, multi-connection
setups (any operator with two or more SAP Logon entries attached) get
`DENIED: ambiguous target: N connections attached and no resolver supplied`
on the very first acquire. The pin file from Step 0.7 has already pinned
`{PINNED_CONNECTION}`, so passing it routes every probe to the same
connection the operator authorised. If you ever need to scaffold across
connections, swap `-PinFile` for `-ConnectionPath "{PINNED_CONNECTION}"`
(same effect, no file read).

**`-OwnerPid 0` is INTENTIONAL** for the scaffolder. Each tool call from the
orchestrator (Claude) spawns a transient `pwsh.exe` process whose PID dies
immediately on return. Passing `-OwnerPid $PID` from inside such a call
would record a dead PID and the broker's reactive sweep would drop the
entry on the next operation. The scaffolder relies on TTL (raised to 30
min via `-TtlSeconds 1800` above) for crash recovery instead — long
enough to outlast any normal parallel batch, short enough that abandoned
scaffolds get cleaned up. **Skill wrappers that run as a single
long-lived `pwsh` process (the typical Tier 3 case) SHOULD pass
`-OwnerPid $PID`** — they benefit from immediate pid_dead detection AND
can leave TTL at the default.

Stdout last line is one of:

```
ACQUIRED: path=/app/con[0]/ses[N] sessionNumber=M reused=<bool>
DENIED:   <reason>     # exit 1 -- typically "cap reached"
ERROR:    <reason>     # exit 2 -- SAP unreachable
```

On `DENIED`/`ERROR` for ANY scenario in the batch, **release all already-
acquired claims for this batch** (`release -TaskId scaffold_<runId>_scenario_<j>` for j<i) and abort. The
broker's `release` is idempotent — calling it for an unknown task_id
returns one of:
- `NOT_FOUND: task=... (no matching claim ...)` — never acquired (real bug or duplicate release).
- `NOT_FOUND: task=... (entry was here but dropped by reactive cleanup ...)` — TTL or session_closed swept the entry before release fired. Idempotent; usually means `-TtlSeconds` was too tight for batch wall-clock.

The `task_id` MUST be unique per acquired claim across the whole scaffold
run; the suggested shape `scaffold_<runId>_scenario_<i>` satisfies that.
`runId` is the scaffolder's own log_helper run id (Step 0.5).

The resulting descriptor list:

```
descriptors = [
  { i: 0, scenario: "<text>", mode: "<label>", task_id: "scaffold_xxx_scenario_0", session: "/app/con[0]/ses[N0]" },
  { i: 1, scenario: "<text>", mode: "<label>", task_id: "scaffold_xxx_scenario_1", session: "/app/con[0]/ses[N1]" },
  ...
]
```

**2.2 — Spawn N general-purpose Task sub-agents** in a single tool message.
Each sub-agent's prompt. **Append `--scenario-type <type>` when the
scenario's type ≠ `success`** so the probe relaxes its abort conditions
(see /sap-gui-probe SKILL.md Step 2.8 tolerance table). The literal flag
text appears in the probe args verbatim.

> You are probe runner #i of N for a sap-gui-skill-scaffold run. Your
> assigned SAP GUI session is `<session>`. The orchestrator has already
> acquired this session through the broker; you do NOT need to touch the
> broker. Invoke `/sap-gui-probe` with this argument string verbatim:
>
>     <scenario> --auto --session <session> [--scenario-type <type>]
>
> (Drop the bracketed `--scenario-type` flag entirely when type is
> `success`.)
>
> When the skill finishes successfully, return ONLY the absolute path of
> the resulting run folder as the LAST line of your message (no extra
> prose after it).
>
> Return SUCCESS (= absolute run folder path on the LAST line) whenever
> the probe COMPLETES through to cleanup, regardless of whether the
> observed end state matches the `scenario_type` prediction. The
> scaffolder's merge step classifies based on what the probe actually
> observed — your job is to drive and observe, not to validate the
> hypothesis. **If a `validation_error` scenario silently activates
> (SAP didn't reject the input as you expected), that's STILL a SUCCESS
> from the probe's perspective** — the observed end state goes into the
> merge report and the scaffolder labels it accordingly.
>
> Return `FAILED:<short reason>` ONLY when the probe truly cannot complete:
> action.vbs returns ERROR, max_steps cap (30) hit, session destroyed
> (Shift+F3 / `oSession.Close`), or NOOP-loop detected with no recovery.
> Misclassified scenario-type is NOT a failure.
>
> Do not touch any other SAP GUI session. Do not invoke unrelated skills.

**2.3 — Collect results.** Wait for all N sub-agents to return. Parse each
agent's last non-empty line:
- absolute folder path → success, append to probe list with the matching mode label.
- `FAILED:<reason>` → record the failure for this scenario index.

**2.4 — Release EVERY claim acquired in this batch.** Always, even on
failure:

```bash
# For each descriptor d in this batch:
powershell -ExecutionPolicy Bypass -File "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_session_broker.ps1" `
    -Action  release `
    -TaskId  "<d.task_id>" `
    -WorkTemp "{WORK_TEMP}"
```

Release drives `/n` on the session (back to Easy Access) and frees the
entry for the next batch. Release is idempotent; calling it for a
task_id that was already released or never acquired returns `NOT_FOUND`
and is harmless.

**2.5 — Failure policy.** If ANY sub-agent returned FAILED, abort the
whole scaffold after releasing all batch claims (Step 2.4 still runs).
Log end Status=FAILED ErrorClass=PROBE_FAILED ErrorMsg="<failed indices>".
Successful probe folders remain on disk.

**2.6 — Batching.** If `scenario_count > parallel_cap`, repeat
Steps 2.1–2.4 in batches of size `parallel_cap`. Each batch acquires
fresh from the broker — the previous batch's releases returned its
sessions to the broker's free pool, so subsequent acquires re-use them
without needing to re-spawn. The broker's reactive cleanup catches
any sessions destroyed by misbehaving sub-agents (e.g. yesterday's
CUKY runner Shift+F3'd its session out of existence — the next batch's
acquire would have noticed via the `session_closed` sweep and spawned
a fresh replacement transparently).

**Concurrency notes:**
- Each cscript process binds to exactly one session via the `session` field
  in its action JSON. No mid-script switching.
- SAP GUI Scripting's `session.LockSessionUI` is per-session, so concurrent
  probes don't fight each other.
- Each probe writes to its own folder; no shared writeable state.
- Cost: each sub-agent has its own context window — roughly N× the token
  cost of the serial path. Use `--parallel` for time savings on 4+ scenarios.

---

## Step 3 — Cross-probe merge

Once every probe succeeded:

```powershell
$folders = @('<folder1>','<folder2>', ...)
$modes   = @('<mode1>','<mode2>', ...)
& '<SKILL_DIR>\references\merge_probes.ps1' -ProbeFolders $folders -ModeNames $modes -OutputFile '{SCAFFOLD_FOLDER}\_merge_report.json'
```

**Why the call-operator (`& '<script>.ps1' ...`) instead of `powershell -File`?**
The PowerShell host's `-File` argument tokenizer turns a literal `-ProbeFolders "a","b","c"` into a SINGLE string `a,b,c` (one element), and `merge_probes.ps1` then refuses with "need at least 2 probe folders; got 1". Invoking via the call operator from a PowerShell host preserves the real `[string[]]` array binding. Pass `$folders` and `$modes` as arrays built up beforehand.

The script reads every `step_NN_action.json` across all probe folders, groups
by `(verb, target)` touchpoint, classifies each as:

- **constant** -- appears in all probes with identical value -> bakes into VBS
- **parameter** -- appears in all probes with varying value -> becomes a
  `%%TOKEN%%` placeholder (token name derived from the DDIC field tail of
  the target, e.g. `%%MATNR%%` from `wnd[0]/usr/ctxtRMMG1-MATNR`)
- **mode-specific** -- appears in only some probes -> goes only into the
  mode VBS files for the modes that used it

It also collects every popup observed in any probe (read from the
`POPUP WINDOW wnd[1]` marker in each step's `_after.txt` dump). Output is
`_merge_report.json` in the scaffold folder.

Last line of stdout: `MERGE OK: probes=<N> touchpoints=<M> parameters=<P> modeSpecific=<MS> popups=<X>`.

---

## Step 4 — Emit the skill folder

```powershell
& '<SKILL_DIR>\references\emit_skill_folder.ps1' `
    -MergeReport  '{SCAFFOLD_FOLDER}\_merge_report.json' `
    -SkillName    '<new-skill-name>' `
    -OutputDir    '{SCAFFOLD_FOLDER}' `
    -Tcd          '<TXN>' `
    -ServerMarker '<server_release_marker-from-pin-or-empty>' `
    -ForceParam   @('L_DEVCLASS','TRKORR')   # optional; see below
```

`-ForceParam` (optional `[string[]]`) overrides the merge classification by
DDIC field tail. Use this when every probe happened to use the same value
for a field — making the merge classify it as a `constant` — but real
users will want to vary it. Canonical examples:

- `L_DEVCLASS` — package name on the Object Directory popup; classified as
  constant because every test scenario said "in <one-package>", but the
  shipped skill should accept any package.
- `TRKORR` — transport request number on the Workbench Request popup;
  same reasoning.

For each matched touchpoint, the emit step flips `constant` → `parameter`,
derives a token from the tail (`%%L_DEVCLASS%%`, `%%TRKORR%%`), and rebuilds
per-probe values from each probe's recorded action so the argument-hint
and SKILL.md dispatch table list the new parameter alongside the
merge-discovered ones.

`-ServerMarker` is `$profile.server_release_marker` from the
`Get-SapCurrentConnectionProfile` call in the second-step of Step 0
(e.g. `S4HANA_2022`, `ECC6_EHP8`). When non-empty, every emitted mode
VBS is named `sap_<name>_<mode>.<marker>.vbs` so the version-aware
selector (`shared/scripts/sap_select_vbs_variant.ps1`) picks it on
matching systems and falls back to the default `.vbs` on non-matching
ones. When the profile has no marker (RFC system info was never captured
or returned empty), pass an empty string and filenames stay untagged.

The script reads the merge report and writes, into `{SCAFFOLD_FOLDER}`:

- `SKILL.md` -- mode dispatch, derived from `references\skill_md.template`
- `README.md` -- short doc with provenance
- `references\sap_<name>_<mode>.vbs` -- one per distinct mode, derived from
  `references\mode_vbs.template`. Each VBS:
  - Attaches to the active SAP GUI session
  - Replays the probe's actions in order
  - Inserts a popup-branch guard (`If IsPopupOpen(oSess) Then ...`) at every
    step where any probe observed a wnd[1] popup
  - Reads the status bar's `MessageType` at the end (per the language
    independence rules) and exits with ERROR if `E` or `A`
- `_source_probes\INDEX.txt` -- provenance (which probe folder informed which mode)
- `_merge_report.json` -- full provenance for downstream tools

Last line of stdout: `EMIT OK: <SCAFFOLD_FOLDER>`.

---

## Step 5 — Self-review

Read the generated `SKILL.md` and each `references\sap_<name>_<mode>.vbs`.
Surface to the user any obvious gaps:

1. **TODO markers** -- any line containing `TODO (human review)`. The popup
   branch guards always emit a TODO so the human chooses dismiss / accept /
   abort logic.
2. **Language-dependent literals** -- the `language_independence_rules.md`
   says: no `.Text =` comparisons, no `.Tooltip =` branches, no `InStr` on
   localized text. Grep each generated VBS for these patterns and flag.
3. **Missing parameter validation** -- the generated SKILL.md does not
   validate that the user passed each required parameter; that's left to the
   human author.
4. **Mode collisions** -- if two scenarios produced the same mode label
   (e.g., both `display-table`) AND the merge report's parameter count
   is zero, surface a real collision warning: the scenarios merged
   without producing any distinguishing parameters, which usually means
   they shouldn't have been collapsed. When parameter count is > 0 the
   collapse was the intended outcome (the variance lives in the
   parameter tokens), so DO NOT warn.
5. **CATALOG-CANDIDATE** (only when scenarios came from `--goal`) -- compare
   each probe's `sap_gui_probe_run.json.observed` (`popups_seen`,
   `final_message_type`) against what `scenario_catalog.tsv` predicted for this
   `txn`/`object_type`. If a probe hit a popup `(program, screen)` or an
   `E`/`A` end state the catalog did NOT have a row for, emit a
   `CATALOG-CANDIDATE` finding: a proposed new TSV row (txn, object_type, a
   `popup_recovery`/`validation_error` type, a `probe_hint` derived from the
   observed signature, `source=<this run's report path>`). This is a
   *suggestion* for the human to hand-add — never auto-write the TSV (CLAUDE.md
   Directive 2).

Output a concise findings list; do not modify the generated files.

---

## Step 5.5 — Autonomous test / fix loop

**Purpose:** exercise the just-emitted DRAFT skill against the live SAP system,
classify each mode pass/fail, auto-fix the fixable failures by editing the DRAFT
files **in `{SCAFFOLD_FOLDER}` only** (never shared scripts or installed skills),
re-run, and stop when all modes are green or progress stalls. The human's *final
acceptance test* (Step 6 hand-off) is the gate after this — this loop just gets
the draft into a state worth handing over.

**Skip entirely when `--no-test` was passed.** Otherwise run it. Test runs
create/delete real objects, so everything here is namespaced and cleaned up.

All loop artifacts live under `{SCAFFOLD_FOLDER}\_test\`. Before the FIRST
auto-fix edit, copy the draft `SKILL.md` + every `references\sap_*_*.vbs` into
`{SCAFFOLD_FOLDER}\_test\_pre_autofix\` so the human can diff what the loop
changed.

### 5.5a — Generate test cases

For each emitted mode (read the dispatch table in the generated `SKILL.md` and
the `touchpoints[].per_probe_values` in `_merge_report.json`):

- **Default each parameter token** to the canonical probe's recorded value.
  Override object-name tokens (tails like `*_VAL`, `RS38M-PROGRAMM`,
  `RS38L-NAME`) with a freshly-minted **throwaway name**:
  `Z<SLOT><RUNTAG><SEQ>` — `<SLOT>` = 2–3-char class tag for cleanup routing
  (`DM` domain, `DE` dataelement, `TB` table, `ST` structure, `TT` tabletype,
  `SH` searchhelp, `LO` lockobject, `VW` view, `PG` program, `FM` fm, `CL`
  class, `IF` interface); `<RUNTAG>` = `T` + last 5 digits of the scaffolder run
  id (Step 0.5) so parallel runs across AI sessions never collide; `<SEQ>` =
  2-digit per-mode counter.
- **`L_DEVCLASS`** → a throwaway test package or `$TMP` (local). **`TRKORR`** →
  resolve via `/sap-transport-request` (never hard-code — CLAUDE.md Directive 4).
- **Per `scenario_type`:** `success` create → brand-new name (persists; recorded
  for cleanup). `not_found` → a name guaranteed not to exist (`...99` /
  `ZNONEXIST_SCAF`; never created). `validation_error` → the probe's bad input
  but a NEW name. `auth_error` → as-is. `popup_recovery` → new name.
- **Modes that need an existing object** (display / change / delete) create a
  fixture first — run the draft's own `create` mode (dogfoods the skill) or the
  matching workbench skill — using an `F`-infixed name and `is_fixture:true`.

**Write `{SCAFFOLD_FOLDER}\_test\test_plan.json` BEFORE running anything** — the
run tag + every planned name + `created`/`is_fixture` flags. A mid-loop crash
then still leaves an authoritative cleanup list. **Order:** create/fixture modes
first, display/change next, delete last (delete consumes its own fixtures).

### 5.5b — Run a mode

Run serially against `{PINNED_SESSION}` (Step 0.7). For each `(mode, iteration)`:

```bash
powershell -ExecutionPolicy Bypass -File "<SKILL_DIR>\references\run_mode_test.ps1" `
    -SkillFolder "{SCAFFOLD_FOLDER}" -SkillName "<new-skill-name>" -Mode "<mode>" `
    -ParamsJson '{"DOMNAME_VAL":"ZDMT<runtag>01","L_DEVCLASS":"$TMP"}' `
    -SessionPath "{PINNED_SESSION}" `
    -OutputDir "{SCAFFOLD_FOLDER}\_test\<mode>\iter_<N>" -WorkTemp "{WORK_TEMP}"
```

It substitutes the tokens (exactly like the generated wrapper does), runs the
mode VBS via 32-bit cscript, and writes `result.json` (exit_code, stdout_tail,
message_type, popup_left_open, end_screen_*). Read that file.

### 5.5c — Classify pass / fail

For **create** modes, also run the verifier (32-bit PowerShell, NCo lives in the
32-bit GAC):

```bash
C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe -ExecutionPolicy Bypass `
    -File "<SKILL_DIR>\references\verify_create_object.ps1" -ObjectType <TYPE> -ObjectName <NAME>
```

(last line `ACTIVE` / `INACTIVE` / `MISSING` / `ERROR`). Verdict by `scenario_type`:

| scenario_type | PASS when | FAIL when |
|---|---|---|
| `success` (non-create) | `exit_code=0`, no popup left open, `end_screen_*` matches the probe's final `step_NN_after`, MessageType not `E`/`A` | exit 3, popup open, screen mismatch |
| `success` (create) | the above **and** verify == `ACTIVE` | `INACTIVE` (→ enh-category/activate fix) or `MISSING` |
| `not_found` | the not-found message reproduced, no object created | exit 0 with object actually created |
| `validation_error` | SAP rejected (`E`/`A`) **OR silently accepted matching the probe's `observed`** — both PASS-with-note | a NEW error class not in `observed`, or stuck screen |
| `auth_error` | the authz message reproduced | clean success (test user has too many rights — note it) |
| `popup_recovery` | popup dismissed (`exit_code=0`) + (create) `ACTIVE` | popup left open / exit 3 |

On a stuck screen or popup-left-open, invoke `/sap-gui-object-details` FIRST
(structural — gives the wnd[1] program/screen + field ids a fix needs);
`/sap-gui-diagnose` (visual PNG) only when the structural tree is inconclusive.

### 5.5d — Auto-fix (DRAFT files only)

Auto-fix these classes, then re-run just that mode and re-run Step 5's
language-independence grep on the edited VBS:

| Symptom | Fix | File edited |
|---|---|---|
| popup left open, known program/screen, no guard | insert `If IsPopupOpen(oSess) Then … End If` with the catalog dismiss/fill (SAPLSTRD/100→`L_DEVCLASS`, SAPLSTRD/300→`TRKORR`, worklist→Continue) | mode VBS |
| `findById` ERROR (id drift) | correct the id from object-details | mode VBS |
| missing/wrong `%%TOKEN%%` substitution | fix the param map + dispatch table (or the VBS literal) | SKILL.md (+ VBS) |
| wrong VKey / missing 2nd Enter (e.g. TIMS) | correct `sendVKey` / add Enter from `observed` | mode VBS |
| table/structure `INACTIVE` (enh-category missing) | insert the enhancement-category step before Activate | mode VBS |
| `not_found` name collided with a real object | bump `<SEQ>` | test_plan.json |

**Do NOT auto-fix** (flag for the human, keep looping other modes): genuine SAP
authorization gaps, ambiguous business-logic popups (Yes/No/data-loss with no
catalog entry — leave the existing TODO), object already exists / locked (SM12),
SAP-release layout drift (needs `/sap-gui-record`).

### 5.5e — Iteration control

- **Max 3 iterations per mode** (per-mode, so a hard mode doesn't starve the
  rest). 3 distinct single-edit fixes not turning a mode green almost always
  means a NOT-auto-fixable cause; more attempts just thrash the draft.
- **Global budget** `--test-budget-min` (default 20). On exceed →
  `Status=ABORTED_BUDGET`, still clean up + report.
- **Stop a mode** on `FAILED_MAX_ITERS` (3 iters) or `FAILED_NO_PROGRESS` (the
  verdict signature `exit_code|end_screen|message_type|popup` repeats, or a fix
  flips A→B→A).
- **Audit:** append one line per iteration to `{SCAFFOLD_FOLDER}\_test\iterations.jsonl`
  (`ts, mode, iter, verdict, exit_code, end_screen, popup, fix_applied,
  fix_target_file, triage_artifacts[]`).

### 5.5f — Fixture cleanup (unconditional)

At loop end (success, max-iters, or budget abort), read `test_plan.json` and
delete every `created:true` object in reverse-creation order, routed by `<SLOT>`
to the matching delete mode via the Skill tool (`/sap-se11` for DDIC, `/sap-se38`
program, `/sap-se37` FM, `/sap-se24` class/interface), using the same TR that
created it. After each delete, run the documented post-delete verify (catalog
table + `TADIR OBJ_NAME=<name>`): clean if both empty; **TADIR orphan** if the
catalog row is gone but the `TADIR` row persists → record the exact SE03 (repo
browser) / SE14 `RS_DD_TABDEL` (table) recovery step in the report, never
silently swallow it. If a delete itself fails, record the survivor under a bold
**MANUAL CLEANUP REQUIRED** heading with the precise command and set run-end
`Status=SUCCESS_WITH_DIRTY_FIXTURES` so `/sap-log-analyze` surfaces it.

### Hand-off

Write a report to `sap-dev/temp/testReport/<new-skill-name>_autotest_<YYYYMMDD>.md`
(CLAUDE.md Rule 8 — test reports go under `temp/testReport/`, never
`contributing/`). Sections: (1) SAP system/client/release + run tag; (2) modes
table `mode | scenario_type | iterations | verdict | fixes`; (3) auto-fixes
applied (pointer to `_test\_pre_autofix\`); (4) residual human-only items;
(5) objects created/cleaned/orphaned (+ the MANUAL CLEANUP block if non-empty);
(6) acceptance checklist. The chat hand-off points to this report and states the
user's manual acceptance test is the next gate.

---

## Step 6 — Cleanup and install hint

Best-effort return SAP GUI to Easy Access (in case Step 2 left the session
mid-flow on the last probe's end state):

```bash
echo {"verb":"SET_OKCD","value":"/n","note":"scaffolder cleanup"} > "{WORK_TEMP}\scaffold_cleanup.json"
C:/Windows/SysWOW64/cscript.exe //NoLogo "<SAP_DEV_CORE_SHARED_DIR>\..\skills\sap-gui-probe\references\sap_gui_probe_action.vbs" "{WORK_TEMP}\scaffold_cleanup.json"
```

Tell the user how to install the generated skill into a plugin:

> The scaffolded skill is at `{SCAFFOLD_FOLDER}`. To install:
> 1. Copy the folder into a plugin's `skills/` directory:
>    `cp -r <SCAFFOLD_FOLDER> <repo>/sap-dev/plugins/<plugin-name>/skills/<new-skill-name>`
> 2. Register in `<repo>/sap-dev/.claude-plugin/marketplace.json` (add to the
>    plugin's `"skills"` array, increment `metadata.total_skills`).
> 3. Run `node sap-dev/scripts/check-consistency.mjs` to verify.
> 4. Reload the plugin (`/plugin install ...`) and test the smoke flow per
>    each source probe's scenario.

---

## Final — Log end

On success:
```bash
powershell -ExecutionPolicy Bypass -File "<SAP_DEV_CORE_SHARED_DIR>\scripts\sap_log_helper.ps1" -Action end -StateFile "{SCAFFOLD_FOLDER}\sap_gui_skill_scaffold_run.json" -Status SUCCESS -ExitCode 0
```

Suggested ErrorClass on failure: `PROBE_FAILED`, `MERGE_FAILED`,
`EMIT_FAILED`, `NO_SESSION`, `BAD_SKILL_NAME`, `INSUFFICIENT_SCENARIOS`,
`PIN_SYSTEM_MISMATCH`, `NO_PIN`.

When the Step 5.5 test/fix loop ran, set the end `Status` to reflect its
outcome instead of a bare `SUCCESS`:
- `SUCCESS` — every tested mode passed (or `--no-test` / `--test-budget-min`
  was honoured and the draft emitted cleanly).
- `TEST_FIXED` — one or more modes needed an auto-fix but all ended green.
- `TEST_FAILED_MODES` — one or more modes hit `FAILED_MAX_ITERS` /
  `FAILED_NO_PROGRESS`; the report lists which and why (still exit 0 — the
  draft + report are the deliverable for the human to finish).
- `SUCCESS_WITH_DIRTY_FIXTURES` — modes passed but fixture cleanup left
  residue; the report's **MANUAL CLEANUP REQUIRED** block has the commands.
- `ABORTED_BUDGET` — the test budget was exhausted; partial results reported.

---

## Recipes

**Goal mode — one-line goal, scenarios imagined + auto-tested (recommended):**
```
/sap-gui-skill-scaffold sap-se11-domain --goal "use SE11 to create a domain"
```
Step 0.9 reads `scenario_catalog.tsv` for `SE11`/`DOMAIN`, imagines a happy-path
plus the matched traps (not-found, bad-length validation, delete-leaves-TADIR),
probes each, emits the draft, then Step 5.5 tests + auto-fixes it and writes a
report. Add `--no-test` to stop at the draft, or `--test-budget-min 30` to widen
the test budget.

**Two-mode SE37 skill (display + delete):**
```
/sap-gui-skill-scaffold sap-se37-mini \
  --scenario "SE37: display FM RFC_READ_TABLE then exit" \
  --scenario "SE37: delete FM Z_SANDBOX_FM"
```

**MM03 with three routes via manifest:**
```
/sap-gui-skill-scaffold sap-mm03-display --manifest mm03_routes.txt
```
where `mm03_routes.txt` contains:
```
# happy path
MM03: display material ZHKAMATVer7001 Basic Data 1 then exit
# error path
MM03: display material ZNONEXISTENT (expect not-found error) then exit
# multi-view path
MM03: display material ZHKAMATVer7001 Sales Org 1 + Plant Data then exit
```

**Overwrite an existing scaffold:**
```
/sap-gui-skill-scaffold sap-se37-mini --manifest se37.txt --force-overwrite
```

---

## Edge cases and gotchas

1. **Scenarios must produce reachable end states.** /sap-gui-probe has a
   30-step hard cap. If a scenario can't complete in 30 steps, the probe
   abandons and the whole scaffold aborts. Keep scenarios focused.

2. **Auto mode side-effects.** Step 2 invokes /sap-gui-probe with `--auto`
   which means write actions (Save / Activate / Delete) run without
   confirmation. If a scenario includes a write action that you'd want to
   pause and approve manually, run that probe by itself first with the
   default confirm mode, then feed only the *folder* to a future
   `/sap-gui-skill-scaffold --from-existing-probes` (not implemented today
   -- recorded in the plan as out-of-scope).

3. **Token-name collisions.** Two different fields can have the same DDIC
   tail (`RS38L-NAME` vs. `RMMG1-NAME` both tail to `NAME`). The merge
   currently de-dups by sorting token output unique; if a collision is
   detected (two distinct targets sharing a token), the emit step falls back
   to `PARAM_NN` for the second one.

4. **De-duplication of probes by mode label.** If two probes get the same
   mode label, their action lists are concatenated into a single mode VBS in
   probe order. This works for "two display variants" but breaks down if the
   action lists overlap in incompatible ways (e.g., both start with `/nSE37`
   but then diverge -- you'll get two `/nSE37` calls in a row). Step 5 flags
   this. The cleanest fix is to give the two scenarios distinct mode labels
   manually in your scenario text (e.g., "display-short" / "display-long").

5. **`POPUP REMINDER` TODOs.** Every popup branch has a default action
   (Continue via `wnd[1]/tbar[0]/btn[0]`) and a TODO comment. This is a
   safe default for informational popups but wrong for "Delete confirmation"
   popups (which need Yes/No). The human must review every popup TODO.
