---
name: large-scale-refactor
description: >
  Guardrails, protocols, and operating constraints for large-scale, long-running,
  or parallelized AI coding tasks — migrations, codebase-wide refactors, framework
  upgrades, and any task touching 50+ files. Prevents scope creep, context drift,
  silent compounding errors, and emergent behavior outside the defined task boundary.
  Use when refactoring across files, migrating frameworks, upgrading dependencies,
  replacing or renaming patterns throughout a codebase, or any task touching 50+
  files.
license: MIT
author: opensite-ai
version: 1.0.0
tags:
  - refactor
  - migration
  - long-running
  - multi-agent
  - guardrails
  - agentic
platforms:
  claude-code: { context: auto, invoke: automatic }
  codex: { invoke: automatic }
  cursor: { invoke: /large-scale-refactor }
  copilot: { invoke: /large-scale-refactor }
  qoder-quest: { scenario: "Code with Spec", environment: remote }
  factory-droid: { invoke: automatic }
activation_patterns:
  - "refactor * across"
  - "migrate * to"
  - "upgrade * from"
  - "replace all"
  - "update every"
  - "rename * throughout"
  - "convert all"
  - "remove all instances"
  - "batch * across the codebase"
  - "files_touched_estimate >= 50"
---

# large-scale-refactor

> **Core constraint**: This skill activates when any task will touch 50 or more files,
> run for longer than one agent session, or be parallelized across multiple agent
> instances. Once active, **the rules below are non-negotiable**. The agent must not
> deviate from the defined task scope under any circumstances, including when it
> identifies what appears to be a related improvement opportunity.

---

## § 1 — BEFORE ANY WORK BEGINS: The Spec Gate

Large-scale refactors **must** begin with a written spec. No execution without a
reviewed spec. This is not optional.

### 1.1 Required Spec Contents

The agent will produce a spec document (or use the provided one) that includes:

```
TASK SPEC
=========
**Task Name**: [Short identifier, e.g. "js-to-ts-migration"]
**Date**: [ISO date]
**Initiator**: [Human who authorized this]

### What This Task Does
[One paragraph. Be concrete. Example: "Convert all .js files in src/ to .tsx,
adding explicit TypeScript types. No logic changes, no style changes, no dependency
additions beyond @types/* packages."]

### Explicit Scope Boundary
**IN SCOPE** (agent may touch):
- [ ] File types: [e.g., *.js files in src/]
- [ ] Operations: [e.g., type annotations, import extensions, rename]
- [ ] Directories: [e.g., src/, tests/ — NOT node_modules, NOT scripts/]

**OUT OF SCOPE — DO NOT TOUCH** (agent must not modify):
- [ ] CSS/styling systems, color tokens, theme configuration
- [ ] Business logic, algorithms, or data transformation behavior
- [ ] Component structure beyond what is required by the task
- [ ] Package.json dependencies (beyond @types/* for TS migrations)
- [ ] Build configs, CI/CD files, deployment configs
- [ ] Files in: [list specific directories]
- [ ] Anything not explicitly listed in IN SCOPE above

### Task Decomposition
[List of atomic subtasks. Each subtask must be independently reviewable.]
1. Subtask A — affects [N] files in [directory]
2. Subtask B — affects [N] files in [directory]
...

### Acceptance Criteria
- [ ] All in-scope files match target state
- [ ] All tests pass (or failures are pre-existing and documented)
- [ ] No net-new dependencies introduced (beyond spec-allowed)
- [ ] No files outside the scope boundary were modified
- [ ] Each subtask produced a separate, reviewable commit or PR

### Rollback Plan
[How to undo this: feature branch name, git revert strategy, etc.]
```

### 1.2 Spec Review Checkpoint

The agent **must halt** after producing the spec and **wait for human approval**
before writing a single line of code. Platform-specific:

- **Qoder Quest**: Use "Code with Spec" scenario. Click `Run Spec` only after human
  reviews the generated Spec document in the Spec Tab.
- **Claude Code / Codex**: Output the spec, then explicitly state:
  > "⏸ SPEC GATE: Please review and reply 'approved' to begin execution, or provide
  > corrections."
- **Factory Droid**: Emit spec as first artifact. Do not proceed until `approved`
  signal is received.
- **Parallel/batch mode (any platform)**: The spec must be approved before spawning
  any parallel instances. The approved spec is the canonical context injected into
  every parallel agent.

---

## § 2 — SCOPE ENFORCEMENT RULES

These rules apply to every file touch, every decision, every line of generated code.

### 2.1 The One Task Rule

> **The agent has exactly one job: complete the task defined in the spec.
> It does not have a second job.**

If the agent notices something that looks like a bug, an improvement opportunity,
 a missing test, a performance gain, a better architecture, or a style inconsistency
 that is **not** in the spec — the correct response is to **log it and leave it alone.**

The agent will maintain a `OBSERVATIONS.md` file in the task working directory:

```
## Observations (NOT acted upon — logged for human review)

| File | Observation | Severity |
|------|-------------|----------|
| src/Button.tsx | Inline styles could use CSS vars | Low |
| src/api/user.ts | Possible N+1 query in fetchUsers | Medium |
```

Never act on observations. Log them. Move on.

### 2.2 The Substitution Test

Before touching any file, the agent applies this test:

> "If I remove this change from the diff, does the task still fail?"

If the answer is "no, the task still succeeds without this change" — **do not make
the change.** Every change must be strictly necessary to complete the defined task.

### 2.3 No Emergent Systems

The agent is explicitly prohibited from creating any of the following unless they
are explicitly named in the spec:

- New design systems, color systems, or theming engines
- New utility libraries or helper modules
- New abstractions, base classes, or shared components
- New folder structures or directory reorganization
- New build pipeline steps, scripts, or tooling
- New configuration files or environment variables
- New test utilities or test harnesses
- Any new file that did not exist before the task began (exceptions: spec-defined
  output files only, e.g., `.ts` replacements for `.js` files in a TS migration)

If the task genuinely requires a new shared utility to be non-repetitive, the agent
will propose it in `OBSERVATIONS.md` and **halt for human approval** before creating it.

### 2.4 Dependency Lockdown

The agent will not add, remove, or upgrade any dependency unless:
1. It is explicitly listed in the spec as an allowed dependency change, OR
2. It is a `@types/*` package required by a TypeScript migration

Any dependency change not meeting these criteria requires an explicit human checkpoint.

### 2.5 Net-New Code Threshold

If completing any single change requires writing more than **50 lines of net-new
logic** (not counting type annotations, renamed identifiers, or structural
reformatting), the agent has likely lost the thread.

> **50-line rule**: If you are about to write more than 50 lines of net-new logic
> to accomplish a refactoring step, stop. Log the situation in `OBSERVATIONS.md`.
> Halt for human review before proceeding.

This threshold exists because large-scale refactors should primarily *transform*
existing patterns, not *invent* new ones. Exceeding 50 lines of net-new logic almost
always indicates scope creep or a task that requires architectural discussion before
continuing.

---

## § 3 — EXECUTION PROTOCOL

### 3.1 Atomic Subtask Commits

> **Pilot Batch Recommendation**: For any new refactor task, process the first
> batch with only 10–20 files — even if the risk level would normally allow more.
> This surfaces edge cases in the spec, validates the transformation approach, and
> lets you refine patterns before scaling to the full codebase. Do not skip the
> pilot batch.

Each subtask from the spec decomposition must land as its own commit or PR, never
combined. Commit message format:

```
refactor(<subtask-id>): <description>

Task: <task-name>
Files: <N> files in <directory>
Spec: <link or reference>
```

This ensures every batch of changes is independently reviewable and independently
revertable.

### 3.2 File Diff Budget per Session

For any single agent session or parallel agent instance, set a maximum file change
budget. Recommended defaults:

| Risk Level | Max files per session | Review cadence |
|------------|----------------------|----------------|
| Low (type renames, import fixes) | 200 files | End of session |
| Medium (logic-adjacent refactors) | 50 files | Every 25 files |
| High (framework migrations, API changes) | 20 files | Every 10 files |

When a session hits its budget, the agent commits, pushes, and **stops**. A human
reviews before the next session begins.

### 3.3 Parallel Agent Isolation

When running parallel instances (Qoder Remote, Factory Droid batch, Devin playbooks):

1. **Each instance receives the approved spec as its first system message.** This is
   the canonical North Star that overrides any in-session reasoning.
2. **Instances are assigned non-overlapping file sets.** No two instances touch the
   same file. Assign by directory or by explicit file list.
3. **Instances operate read-only with respect to shared abstractions.** No instance
   may create a new shared utility, base class, or module. If one is needed, it is
   flagged and a human creates it before instances consume it.
4. **Instances do not communicate with or observe each other's output.** Context
   contamination between parallel agents is a primary source of emergent behavior.

### 3.4 Drift Detection Checkpoint (every N files)

At the cadence defined in § 3.2, or every 25 files if not specified, the agent
pauses and performs a self-audit:

```
DRIFT CHECK
===========
Files touched so far: [N]
Task: [task-name]

1. Does every changed file appear in the IN SCOPE list? [yes/no + evidence]
2. Did I add any new files not defined in the spec? [yes/no + list if yes]
3. Did I add, remove, or modify any dependency? [yes/no + list if yes]
4. Did I make any change that fails the Substitution Test (§ 2.2)? [yes/no]
5. Did I create any new abstraction, utility, or system? [yes/no]

If any answer above is "yes": HALT. Surface to human. Do not continue.
If all answers are "no": Continue. Attach this log to the commit message.
```

---

## § 4 — HUMAN CHECKPOINTS

Human checkpoints are **hard stops**. The agent does not continue until a human
explicitly clears the checkpoint. These are not suggestions.

### Mandatory Checkpoint Triggers

| Trigger | Action |
|---------|--------|
| Spec gate (§ 1.2) | Halt. Await "approved". |
| Drift check failure (§ 3.4) | Halt. Surface log. Await clearance. |
| Any file outside scope boundary discovered | Halt. Report. Await instruction. |
| Any new dependency required | Halt. Propose. Await approval. |
| Any new shared abstraction required | Halt. Propose in OBSERVATIONS.md. Await approval. |
| Build/tests fail in a way the spec didn't anticipate | Halt. Report failure + root cause. Await instruction. |
| File diff budget reached (§ 3.2) | Stop session. Commit. Await review. |
| Ambiguity about whether a file is in scope | Halt. Ask. Do not assume. |

### Checkpoint Message Format

```
⏸ CHECKPOINT — [checkpoint type]

**Trigger**: [What caused this halt]
**Context**: [Relevant file(s), line(s), or situation]
**Options**:
  A. [First option and its implications]
  B. [Second option and its implications]
  C. Abort task and preserve current state for human review

**Recommendation**: [Agent's recommendation, briefly]

Awaiting instruction. No changes will be made until a response is received.
```

---

## § 5 — OUTPUT AND VERIFICATION REQUIREMENTS

### 5.1 Change Manifest

Upon completion of each subtask, the agent produces a `CHANGE_MANIFEST.md`:

```
CHANGE MANIFEST — <subtask-id>
==============================
Task: <task-name>
Completed: <ISO datetime>
Files modified: <N>
Files created: <N> (should be 0 for pure refactors)
Files deleted: <N>

### Modified Files
| File | Change Type | Lines +/- | Notes |
|------|-------------|-----------|-------|
| src/Button.tsx | Type annotation | +12/-3 | Added Props interface |
...

### Scope Compliance
- [ ] All modified files were in the IN SCOPE list
- [ ] No files were created outside spec-defined outputs
- [ ] No dependencies were added or removed beyond spec-allowed
- [ ] No new abstractions or systems were created
- [ ] All drift checks passed (logs attached)

### Test Results
- Before: [pass/fail count]
- After: [pass/fail count]
- New failures: [list or "none"]
- Pre-existing failures: [list or "none"]
```

### 5.2 Verification Command Sequence

Before marking any subtask complete, the agent runs and records output of:

```bash
# 1. Dependency check — confirm no unauthorized changes
git diff HEAD package.json package-lock.json yarn.lock Cargo.toml Gemfile*

# 2. New file audit — confirm no unauthorized new files
git diff HEAD --name-status | grep "^A"

# 3. Scope boundary check — confirm no out-of-scope files
git diff HEAD --name-only | grep -v -f .refactor-scope-allowlist

# 4. Test suite
[platform test command]

# 5. Lint
[platform lint command]
```

The `.refactor-scope-allowlist` file is created from the IN SCOPE file list in
the spec and used as the filter in step 3. Any file appearing in step 3's output
that is not in the allowlist triggers a checkpoint.

---

## § 6 — CONTEXT PERSISTENCE ACROSS SESSIONS

Long-running tasks frequently span multiple sessions. Context that must survive:

### 6.1 Session Handoff File (`.refactor-session.md`)

The agent writes this file at the end of every session and reads it at the start
of the next:

```
REFACTOR SESSION HANDOFF
========================
Task: <task-name>
Last session: <ISO datetime>
Agent: <model/platform>

### Progress
- Completed subtasks: [list]
- In-progress subtask: [current subtask, % complete]
- Remaining subtasks: [list]

### Files Remaining
[list of files not yet processed in current subtask]

### State to Carry Forward
- [Any decisions made this session that affect future sessions]
- [Any edge cases discovered]
- [Any files that required special handling and why]

### Active Blockers
[Anything waiting on human input]

### Spec Ref
[path or link to approved spec]
```

This file is committed with each session's changes. It is the handoff protocol
that ensures a fresh agent context (new session, different model, different
platform) can resume without drift.

### 6.2 Context Flushing Protocol (Anti-Degradation)

Agent performance degrades measurably as context windows fill with accumulated
file diffs and intermediate reasoning. To counter this, the agent actively flushes
context between batches.

**After completing each batch:**
1. Commit and push all changes.
2. Write the session handoff file (§ 6.1).
3. **Discard from active context**: all file diffs, modified file contents, and
   intermediate reasoning from the completed batch.
4. **Reload into fresh context**: the approved spec + the latest `.refactor-session.md`
   + the next batch file list only.

**Flush immediately if any of these signals appear:**
- Referencing a decision made more than 2 batches ago without first consulting
  `.refactor-session.md`
- Uncertainty about the original task without re-reading the spec
- Making changes that "feel right" based on pattern matching rather than explicit
  spec compliance

---

## § 7 — PLATFORM-SPECIFIC NOTES

### Qoder Quest

- Always use **"Code with Spec"** scenario for tasks triggering this skill.
- Use **Remote** execution environment for tasks touching > 100 files.
- The Spec Tab output IS the spec gate (§ 1.2). Do not click "Run Spec" until
  the human has reviewed the Spec Tab and approved.
- Use the parallel/Worktree environment for subtask isolation. Assign non-overlapping
  directories to each Worktree instance.
- Qoder's built-in "Changed Files Tab" satisfies the change manifest for simple
  tasks; for complex migrations, `CHANGE_MANIFEST.md` is still required.

### Claude Code / Codex

- Prefix long-running refactor sessions with: `@large-scale-refactor [task-name]`
- Claude Code's `context: auto` will load this skill automatically when activation
  patterns match.
- For Codex deepening sessions on this skill, the deepening scope is limited to
  the Verification Command Sequence (§ 5.2) and platform-specific notes. Core
  guardrails (§§ 1–4) are NOT subject to deepening.

### Factory Droid / Devin Playbooks

- The approved spec **must** be injected as the system prompt for every Droid/session
  instance before task delegation.
- Batch/playbook tasks must include the file diff budget (§ 3.2) as a hard stop
  condition in the playbook config.
- Each parallel Droid instance must receive an explicit non-overlapping file list.
  Playbooks that operate on "all files matching pattern X" without per-instance
  assignment are prohibited under this skill.
- Devin-specific: use the embedded IDE's real-time view to monitor for out-of-scope
  file touches during the first 10 files of any new playbook run. Catch emergent
  behavior early.

### GitHub Copilot

- Invoke via `/large-scale-refactor [task-name]` in chat before beginning.
- Copilot's workspace context must be scoped to the IN SCOPE directories only.
  Close or exclude out-of-scope directories from the workspace before starting.

---

## § 8 — QUICK REFERENCE: WHAT TO DO WHEN YOU'RE UNSURE

| Situation | Action |
|-----------|--------|
| "This file might be in scope, I'm not sure" | Ask. Don't touch. |
| "This would be cleaner if I also refactored X" | Log in OBSERVATIONS.md. Don't touch X. |
| "Tests are failing and I know how to fix it" | Check if the fix is in spec. If not: halt, report. |
| "I found a bug while doing this refactor" | Log in OBSERVATIONS.md. Leave the bug alone. |
| "This approach requires a new shared utility" | Halt. Propose. Wait for approval. |
| "I could make this faster/better/cleaner" | That is not the task. Log. Move on. |
| "The spec is ambiguous about this file" | Surface the ambiguity. Await clarification. |
| "I hit the file budget for this session" | Stop. Commit. Push. Report progress. |
