---
name: fleet
description: DAG-based parallel pipeline execution for epics using isolated git worktrees. Use this skill when the user wants to implement an entire epic in parallel, analyze story dependencies with DAG-based ordering, run multiple ticket-implementation agents simultaneously, or merge parallel work back together.
---

# Fleet Skill

**For Claude Code AI Assistant**

This skill enables parallel implementation of epic stories by decomposing them into a dependency DAG (directed acyclic graph), running ticket-implementation agents in isolated git worktrees in topological layer order, and merging the results.

## Purpose

When an epic has multiple stories that can be worked on independently, Fleet analyzes both file-level dependencies and explicit dependency declarations, builds a DAG, sorts it topologically into execution layers, and orchestrates parallel execution across isolated worktrees. This dramatically reduces wall-clock time for epic completion while respecting inter-story dependencies.

## Safety Rules (MANDATORY)

1. **Maximum 5 parallel worktrees** -- never exceed this resource limit
2. **Always analyze before executing** -- never skip the analysis phase
3. **Conflicts require human approval** -- never auto-merge when conflicts are detected
4. **One epic at a time** -- do not run fleet on multiple epics simultaneously

## DAG-Based Execution

### Overview

Stories in an epic can declare dependencies on other stories. Fleet builds a directed acyclic graph (DAG) from these declarations, performs a topological sort, and organizes stories into execution layers. Stories within the same layer run in parallel; layers execute sequentially.

### Declaring Dependencies

Stories declare dependencies via a `depends_on` field in their description or technical notes:

```
depends_on: 3.1, 3.2
```

Or with bracket syntax:

```
depends_on: [3.1, 3.2]
```

This means the story cannot start until stories 3.1 and 3.2 have completed successfully.

### How the DAG Is Built

1. **Parse declarations**: The analyze phase scans each story's title, description, and acceptance criteria for `depends_on:` patterns
2. **Build adjacency list**: Each dependency becomes a directed edge from the prerequisite to the dependent story
3. **Topological sort**: Kahn's algorithm assigns each story to an execution layer
4. **Cycle detection**: If the graph contains a cycle, the analyze phase errors with the cycle path and refuses to proceed
5. **Combine with file-overlap analysis**: The DAG from explicit dependencies is merged with the file-overlap grouping. If two stories have overlapping files but no explicit dependency, they are still sequenced.

### Execution Layers

The topological sort assigns stories to layers:

- **Layer 0**: Stories with no dependencies -- these run first, in parallel
- **Layer 1**: Stories that depend only on layer 0 stories -- these run after all layer 0 stories complete
- **Layer N**: Stories that depend on stories in layers 0 through N-1

```
Layer 0:  [3.1]   [3.2]   [3.4]      <- parallel, no deps
            \       |       /
             \      |      /
Layer 1:      [3.3]                   <- depends on 3.1, 3.2
                |
Layer 2:      [3.5]                   <- depends on 3.3
```

### Failure Propagation

When a story fails (even after retry), its downstream dependents are marked **BLOCKED**, not FAILED:

- Story 3.1 FAILS -> Story 3.3 (depends on 3.1) is marked BLOCKED
- Story 3.5 (depends on 3.3) is also marked BLOCKED (transitive)
- Blocked stories are never launched -- they are skipped with a note explaining which upstream dependency failed
- The finalize report includes a blocked story count and the full dependency chain showing what caused each block

This distinction matters: FAILED means the story was attempted and did not succeed. BLOCKED means the story was never attempted because a prerequisite failed.

### Visualizing the DAG

Use the `dag` subcommand to pretty-print the dependency graph after analysis:

```bash
fleet-manager.sh dag <epic_id>
```

This outputs:
- Layered view showing which stories run at each level
- Dependency edges between stories
- A text-based graph visualization

### DAG + File-Overlap Integration

The DAG from explicit `depends_on` declarations and the dependency graph from file-overlap prediction are complementary:

- **Explicit dependencies** capture logical ordering (e.g., database migration before API changes)
- **File-overlap dependencies** capture implementation conflicts (e.g., two stories both modifying `routes/index.ts`)
- Both are combined into the final execution plan. A story is placed in the earliest layer that satisfies all its constraints.

## Phase 1: Analysis (`/afx-fleet analyze <epic_id>`)

### Purpose

Read all TO_DO stories for an epic, parse dependency declarations, build a DAG, predict which files each story will modify, combine explicit and file-overlap dependencies, and output a layered parallel execution plan.

### Steps

**Step 1 -- Query Stories and Build DAG**

```bash
~/.claude/skills/afx-fleet/fleet-manager.sh analyze <epic_id>
```

This queries the ticket database, parses `depends_on` declarations from story text, builds a dependency DAG with topological sort, and outputs all TO_DO stories with their descriptions, acceptance criteria, and execution layers. If a cycle is detected, the command errors and no analysis is saved.

**Step 2 -- Predict File Targets**

For each story returned by the analysis:

1. Read the story title, description, and acceptance criteria carefully
2. Examine the current codebase to identify which files will likely be created or modified
3. Consider: source files, test files, configuration files, documentation
4. List the predicted file targets for each story

Present the predictions in a clear table:

```
Story 3.1 - "Add user authentication"
  Predicted files:
    - src/auth/auth-service.ts (new)
    - src/auth/auth-middleware.ts (new)
    - src/routes/index.ts (modify)
    - tests/auth/auth-service.test.ts (new)

Story 3.2 - "Add payment processing"
  Predicted files:
    - src/payments/payment-service.ts (new)
    - src/payments/stripe-client.ts (new)
    - tests/payments/payment-service.test.ts (new)
```

**Step 3 -- Build Combined Dependency Graph**

Merge the DAG from explicit `depends_on` declarations with file-overlap analysis:
- Stories with **no overlapping files** and **no explicit dependencies** are **independent** and run in the same layer
- Stories with **overlapping files** must be **sequenced** (placed in later layers)
- Stories with **explicit dependencies** wait for their prerequisites regardless of file overlap
- A story that modifies a shared config file (e.g., package.json, build config) creates a dependency with any other story that also modifies it

**Step 4 -- Output Execution Plan**

Present the plan showing DAG layers and dependencies:

```
=== Fleet Execution Plan for Epic 3 ===

Layer 0 (no dependencies -- runs first):
  - Story 3.1: Add user authentication
  - Story 3.2: Add payment processing
  - Story 3.4: Add email templates

Layer 1 (runs after layer 0 completes):
  - Story 3.3: Add checkout flow (depends on 3.1, 3.2)

Layer 2 (runs after layer 1 completes):
  - Story 3.5: Add order confirmation (depends on 3.3)

Estimated parallel speedup: 3x (5 stories in 3 layers instead of 5 sequential)
```

Optionally, show the DAG visualization:

```bash
~/.claude/skills/afx-fleet/fleet-manager.sh dag <epic_id>
```

Ask the user to review and approve the plan before proceeding to execution.

## Phase 2: Execution (`/afx-fleet run <epic_id>`)

### Prerequisites

- Analysis phase must have been completed for this epic
- User must have approved the execution plan
- Git working tree must be clean (no uncommitted changes)

### Steps

**Step 1 -- Validate Preconditions**

```bash
git status --porcelain
```

Ensure no uncommitted changes. If dirty, ask the user to commit or stash.

Verify the analysis has been completed:
```bash
~/.claude/skills/afx-fleet/fleet-manager.sh status
```

**Step 2 -- Execute DAG Layers Sequentially**

For each layer in the execution plan, starting with Layer 0:

1. For each story in the layer, launch an Agent with worktree isolation:
   - Use Claude Code's **Agent tool** with `isolation: "worktree"`
   - The agent prompt should instruct it to follow the ticket-implementation workflow for the specific story
   - Each agent works in its own isolated git worktree, so there are no conflicts during implementation

2. The agent prompt for each story should be:

```
Implement ticket <STORY_ID> following the ticket-implementation skill workflow.

Story details:
- Title: <title>
- Description: <description>
- Acceptance Criteria: <criteria>

Use the ticket-manager to mark progress. Create a feature branch named
fleet/<STORY_ID>-<short-description>. Commit your changes when done.
```

3. **Respect the limit**: Never launch more than 5 agents simultaneously. If a layer has more than 5 stories, split it into sub-groups of 5.

4. Monitor progress -- as each agent completes, log its result.

**Step 3 -- Collect Results and Propagate Failures**

After all agents in a layer complete:
- Record which stories succeeded and which failed
- For failed stories, capture the error context
- **Propagate failures**: Mark all downstream dependents of failed stories as BLOCKED (not FAILED). Use the DAG edges to determine the full set of transitively blocked stories.
- Update the fleet status file

**Step 4 -- Proceed to Next Layer**

Before launching the next layer:
- Merge all successful branches from the current layer into the main branch (following the merge phase)
- Ensure the main branch is stable
- Skip any stories in the next layer that are marked BLOCKED
- Then launch the next layer's non-blocked agents

## Phase 3: Merge (`/afx-fleet merge`)

### Purpose

Merge completed worktree branches back to the main branch, detect conflicts, and generate a summary report.

### Steps

**Step 1 -- List Completed Branches**

Identify all `fleet/*` branches that have been completed by agents.

**Step 2 -- Merge Branches Sequentially**

For each completed branch:

```bash
git merge fleet/<story_id>-<description> --no-ff -m "Merge fleet/<story_id>: <title>"
```

- Use `--no-ff` to preserve branch history
- If the merge succeeds cleanly, continue to the next branch
- If the merge has **conflicts**, STOP and report to the user

**Step 3 -- Handle Conflicts**

When conflicts are detected:

1. List all conflicting files
2. Show the conflicting stories and their file targets
3. Ask the user how to resolve:
   - Manual resolution (user fixes conflicts)
   - Skip this branch (merge the others first, come back to this one)
   - Abort fleet merge entirely

**NEVER auto-resolve conflicts. Always require human approval.**

After the user resolves conflicts:
```bash
git add <resolved-files>
git commit -m "Resolve fleet merge conflict: <story_id> into main"
```

**Step 4 -- Generate Report**

```bash
~/.claude/skills/afx-fleet/fleet-manager.sh report <epic_id>
```

Present a summary to the user:

```
=== Fleet Run Report for Epic 3 ===

Stories Completed: 3/6
Stories Failed: 1/6
Stories Blocked: 2/6

Per-Story Results:
  3.1 - Add user authentication      [MERGED]        ~45,000 tokens   2m 05s
  3.2 - Add payment processing       [MERGED]        ~62,000 tokens   3m 12s
  3.3 - Add checkout flow            [FAILED - compile error] (retry exhausted)
  3.4 - Add email templates          [MERGED]        ~28,000 tokens   1m 15s
  3.5 - Add order confirmation       [BLOCKED]       (depends on 3.3)
  3.6 - Add shipping notifications   [BLOCKED]       (depends on 3.5)

Merge Results:
  Clean merges: 3
  Conflict merges: 0
  Pending: 0

Cost Summary:
  Total Fleet Tokens:  ~135,000
  Total Wall-Clock:    5m 17s
  Sequential Estimate: 10m 30s
  Speedup Factor:      1.99x

Failed Story Details:
  3.3: Compilation failed in src/checkout/flow.ts
        Attempt 1: [error summary]
        Retry:     [error summary]

Blocked Story Details:
  3.5: Add order confirmation
    blocked by: 3.3 (FAILED)
  3.6: Add shipping notifications
    blocked by: 3.5 <- 3.3 (FAILED)
```

## Workflow Integration

### With ticket-manager

Fleet reads stories from the ticket database and updates their status:
- Stories are marked IN_PROGRESS when agents start
- Stories are marked DONE when agents complete and branches merge successfully
- Failed stories remain IN_PROGRESS with notes about the failure

### With ticket-implementation

Each agent launched by Fleet follows the full ticket-implementation workflow:
- Preparation, analysis, implementation, validation, completion
- All quality gates apply within each worktree
- The only difference is that the branch naming uses `fleet/` prefix

## Retry Logic

When an agent fails during execution, Fleet retries it automatically before giving up.

### Retry Strategy

1. **First failure**: Log the error, then retry the agent once with the same prompt and a fresh worktree
2. **Second failure (retry also fails)**: Mark the story as FAILED with failure details from both attempts. Do NOT retry again.
3. **Continue regardless**: A failed story never aborts the fleet run. Other stories in the group and subsequent groups proceed normally.

### Tracking Retries

When retrying a story, update the fleet run JSON:

```bash
~/.claude/skills/afx-fleet/fleet-manager.sh update-story <story_id> running --retry-count 1
```

After the retry fails:

```bash
~/.claude/skills/afx-fleet/fleet-manager.sh update-story <story_id> failed "Retry exhausted: <error summary>" --retry-count 1
```

The `retry_count` field in the active run JSON tracks how many attempts have been made (0 = first attempt, 1 = first retry).

### What Triggers a Retry

- Agent process exits with an error
- Agent completes but the validation step (tests, lint) fails
- Agent times out (worktree is cleaned up before retry)

### What Does NOT Trigger a Retry

- Merge conflicts (these are handled in the merge phase, not during execution)
- User cancellation

## Enhanced File-Overlap Prediction

### Purpose

Accurate file-overlap prediction is critical for grouping stories correctly. Poor predictions lead to merge conflicts that could have been avoided.

### Prediction Sources

When predicting which files a story will modify, scan ALL of the following:

1. **Story title**: Often names the component directly (e.g., "Add user authentication" implies `src/auth/`)
2. **Story description**: May mention specific files, modules, or architectural layers
3. **Acceptance criteria text**: Often contains the most specific technical details about what must change

### Heuristic Patterns

Scan story text (title + description + acceptance criteria) for these patterns that indicate file targets:

| Pattern | Implies |
|---------|---------|
| `src/`, `lib/`, `app/` | Specific source directory |
| `.tsx`, `.ts`, `.js`, `.py` | Specific file type or path |
| `component` | UI component files |
| `route`, `endpoint`, `API` | Routing/API layer files |
| `model`, `schema`, `migration` | Data layer files |
| `test`, `spec` | Test files corresponding to source files |
| `config`, `env`, `.yaml`, `.json` | Configuration files (high overlap risk) |
| `middleware` | Middleware layer files |
| `hook`, `util`, `helper` | Shared utility files (high overlap risk) |

### Module-Name Overlap Detection

If two stories mention the **same component or module name** (even without explicit file paths), flag them as a potential overlap:

- Story A: "Add validation to **checkout** form" 
- Story B: "Add discount codes to **checkout** page"
- Result: Flag potential overlap on `checkout` module. Investigate shared files before grouping in parallel.

### Implementation

The `analyze` command queries acceptance criteria from the ticket database. Claude Code should:

1. Run `fleet-manager.sh analyze <epic_id>` to get stories with acceptance criteria
2. For each story, extract file-path patterns and module names from ALL text fields
3. Cross-reference extracted patterns across stories
4. Flag any pair of stories with shared module names or file patterns
5. Use flags to inform grouping decisions (flagged pairs should be sequenced, not parallelized)

## Cost Tracking

### Purpose

Track estimated resource usage per agent and across the fleet run to help with planning and cost awareness.

### Per-Agent Tracking

After each agent completes (success or failure), record:

- **Token estimate**: Approximate input + output tokens used by the agent
- **Duration**: Wall-clock time from agent launch to completion (seconds)

Update the fleet run JSON:

```bash
~/.claude/skills/afx-fleet/fleet-manager.sh update-story <story_id> completed --token-estimate 45000 --duration 120
```

### Fleet-Level Metrics

The finalize report includes a cost summary section:

```
=== Cost Summary ===

Per-Story Breakdown:
  3.1 - Add user authentication     ~45,000 tokens   2m 05s
  3.2 - Add payment processing      ~62,000 tokens   3m 12s
  3.3 - Add checkout flow           ~38,000 tokens   1m 48s
  3.4 - Add email templates         ~28,000 tokens   1m 15s
  3.5 - Add order confirmation      FAILED

Total Fleet Tokens:  ~173,000
Total Wall-Clock:    8m 20s  (sum of group durations)
Sequential Estimate: 15m 30s (sum of all story durations)
Speedup Factor:      1.86x
```

### Speedup Factor Calculation

```
speedup = sum(all_story_durations) / total_wall_clock_time
```

Where `total_wall_clock_time` is the time from fleet run start to completion (groups run sequentially, stories within a group run in parallel, so group duration = max story duration in that group).

## Error Handling

### Agent Failure

If an agent fails during execution:
1. Log the failure with context
2. **Retry once automatically** (see Retry Logic above)
3. If retry also fails, mark as FAILED and continue with other agents
4. **Propagate to dependents**: Mark all downstream stories in the DAG as BLOCKED with a note identifying the failed upstream story
5. Report failures and blocks (including retry attempts and dependency chains) in the summary
6. Failed stories can be retried individually later via `/ticket-implementation`

### Merge Failure

If a merge fails:
1. Do not proceed with remaining merges
2. Report the conflict details
3. Wait for user resolution
4. Resume merging after resolution

### Worktree Cleanup

After fleet completion (success or failure):
- All temporary worktrees should be cleaned up
- Only the merged branches on main remain
- Fleet status file is updated with final results

## Example Usage

### Full Epic Parallel Execution

```
User: /afx-fleet analyze 3

Claude: [runs analysis, shows execution plan]

User: Looks good, run it.

Claude: /afx-fleet run 3
[launches agents in parallel groups]
[reports progress as agents complete]

Claude: All agents complete. Ready to merge.

User: /afx-fleet merge

Claude: [merges branches, reports results]
```

### Analysis Only

```
User: /afx-fleet analyze 5

Claude: [shows dependency DAG and layered execution plan]
        [user reviews and may adjust]
```

### View Dependency Graph

```
User: /afx-fleet dag 5

Claude: [shows pretty-printed DAG with layers and edges]
```

### Check Status

```
User: /afx-fleet status

Claude: [shows active fleet execution status, including blocked stories]
```
