---
name: code-graft
description: Use when user wants to vendor a slice of a library into their project as inline source — copy only the code actually used, trim the rest, keep the ability to re-sync from upstream later. Triggers on phrases like "vendor this library", "copy this lib into my project", "internalize X", "fork just the parts I use", "stop depending on Y but keep the code I need", or user wants to patch a lib locally without losing future upstream changes. Pairs with surgical-github-extraction (one-off small lifts); code-graft handles meaningful slices with a sync manifest.
---

# Code Graft

## Why

Lib too big for what you use. Or slow to release. Or you patched it and lost the patch on upgrade. Vendor only the slice you need, own it, sync future updates selectively.

## Use when

- User says "vendor this lib" / "internalize X" / "fork just the parts I use".
- Lib too big for use (e.g. 2GB transformers for one tokenizer) or slow to release.
- User want to patch lib locally + survive upgrades.
- User want to drop runtime dep but keep some of its code.
- Lib unmaintained — vendor + own it.

## Skip when

- One small function — use `surgical-github-extraction` instead.
- Lib = runtime engine (Next, React, FastAPI) — keep as real dep.
- User would use ≥80% of lib — just install.
- Lib license forbids redistribution — check first.

## Rules

1. **Manifest tracks every graft.** Project-root `.code-graft.yaml` lists source, SHA, files, trim notes.
2. **Raw files, never clone.** Same as `surgical-github-extraction`.
3. **Pin SHA.** Branch names rot.
4. **Trim hard.** Drop classes/functions/branches user don't use. Grep user code for symbols actually called.
5. **Provenance header per file.** Top comment: source path, SHA, "do not edit upstream — re-sync via code-graft".
6. **Respect license.** Preserve original license header. Add NOTICE entry if Apache 2.0 or similar.
7. **Local mods preserved on sync.** Diff old vs new upstream, apply changes only to files user didn't modify, surface conflicts.

## Workflow — initial graft

1. **Confirm scope.** Which symbols/classes? Source dir? Target dir in user project?
2. **Pin SHA.** `gh api repos/<o>/<r>/commits/<branch> --jq .sha | cut -c1-7`.
3. **Fetch needed files** via raw URLs to tempdir.
4. **Determine slice.** Read user's existing code. Grep for symbols used. Build dependency closure within lib — only files actually reached.
5. **Trim.** Remove unused classes/functions/branches. Strip imports for removed code.
6. **Adapt.** Match project layout. Rewrite relative imports for new path.
7. **Write provenance headers.**
8. **Update `.code-graft.yaml`** with new entry.
9. **Verify.** Run user typecheck/test. Vendored slice should be self-contained.

## Workflow — re-sync

1. **Read manifest entry** for the graft.
2. **Fetch same source paths at new SHA** + at recorded SHA (for diff base).
3. **Diff** old upstream vs new upstream.
4. **Per file:**
   - User didn't modify locally → apply upstream changes.
   - User modified locally → 3-way merge or surface conflict.
5. **Re-trim** based on user's current usage (may have changed since first graft).
6. **Update manifest** with new SHA.

## Manifest format

```yaml
# .code-graft.yaml
grafts:
  - name: hf-tokenizer
    source: huggingface/transformers
    sha: abc1234
    target: src/vendor/tokenizer/
    files:
      - src/transformers/tokenization_utils_base.py → src/vendor/tokenizer/base.py
      - src/transformers/tokenization_utils_fast.py → src/vendor/tokenizer/fast.py
    trimmed:
      - SlowTokenizerBase
      - BatchEncoding.legacy_format
    notes: avoid 2GB runtime dep, only use fast tokenizer
```

## Anti-patterns

- ❌ `git clone` upstream into project tree. Use raw fetches.
- ❌ Vendor without trimming. Defeats the point.
- ❌ Skip manifest. Future-you can't re-sync without one.
- ❌ Blow away user's local mods on re-sync. Always diff first.
- ❌ Pin `main` not SHA.
- ❌ Vendor a lib that is user's runtime engine.
- ❌ Vendor without preserving original license header.

## Quick reference

```
init graft  → write entry in .code-graft.yaml + fetch + trim + provenance headers
re-sync     → read manifest → fetch new SHA → diff vs old SHA → merge → update SHA
pin SHA     → gh api repos/<o>/<r>/commits/<branch> --jq .sha | cut -c1-7
fetch file  → curl -fsSL raw.githubusercontent.com/<o>/<r>/<sha>/<path>
provenance  → # graft: github.com/<o>/<r>@<sha>:<path> — do not edit, re-sync via code-graft
```
