--- name: android-regression-diff-scan description: Use INSTEAD of git bisect when investigating a regression between two refs (releases, branches, "it worked yesterday") — especially when builds are slow or the bug is hard to reproduce. Hand the full `git diff ` to a Sonnet sub-agent along with the bug description and let it surface suspect areas. Bisect exists because humans can't reason about thousands of lines at once. LLMs can. No builds, no waiting — minutes instead of an hour of compiling. --- # Android Regression Diff Scan ## Why this beats `git bisect` for mobile `git bisect` exists because humans can't reason about thousands of lines of change at once. The bisect dance — narrow the range, build, test, narrow again — is a workaround for human context limits. LLMs don't share that limit. A sub-agent can read 400K diff lines and spot the suspicious patterns directly. A motivating example: investigating a regression between two releases with **1,300 commits** and **413,032 lines changed** between them. With 2-minute builds, bisect is ~22 minutes of pure waiting. With 5-minute builds (typical for a real Android app), it's nearly an hour. A diff scan takes minutes with no builds. This is the right tool whenever: - The bug repros in the bad ref but not the good ref - Builds are slow (most non-trivial Android projects) - You don't have a reliable repro script for an automated bisect - The diff is large enough that human reading would be guesswork ## When to use - "It worked in release N, broken in release N+1" — release tag pair - "Main works, my feature branch doesn't" — branch pair - "Last week's build is fine, this week's crashes" — date-based ref pair - Any regression where you have a known-good ref and a known-bad ref ## When NOT to use - The bug reproduces locally and builds are fast (<1 min) — `git bisect` with a script is still the right tool - The diff is small (<500 lines) — just read it - You don't have a known-good ref — use `android-probe-logging` to investigate from symptoms - The bug is non-deterministic and not in changed code — use `android-crash-repro-loop` to characterize it first ## Pre-flight: detect what your repo supports ```bash # 1. The two refs are reachable git rev-parse # both should resolve to a SHA # 2. The diff size — sanity-check before generating a multi-MB patch git diff --shortstat .. # 3. The commit count and span — gives you a rough sense of investigation scope git log .. --oneline | wc -l git log .. --format='%ai' | sort -u | head -1 git log .. --format='%ai' | sort -u | tail -1 ``` **If a ref is unreachable**, fetch the relevant remote tags / branches before scanning: `git fetch origin --tags`. Working from a shallow clone (CI artifacts, GitHub Codespace) often means missing history — check `git rev-parse --is-shallow-repository` and `git fetch --unshallow` if true. **Diff size guidance:** | Diff size | Strategy | |-----------|----------| | < 500 lines | Just read it; don't bother with sub-agent delegation | | 500–10K | Single sub-agent pass against the full diff | | 10K–100K | Single sub-agent, but include `--stat` and `git log --oneline` to give it directory hints | | 100K+ | Split by directory and run scans in parallel; combine the rankings | | 1M+ | The bug brief needs to identify a likely subsystem first; don't scan a million lines blind | **Vendored / generated code in the diff.** Large auto-generated directories (`generated/`, `build/`, vendored deps) waste sub-agent attention. Filter them out: ```bash git diff .. -- ':!**/generated/**' ':!**/build/**' ':!**/.gradle/**' \ > /tmp/regression-diff.patch ``` **Monorepo with non-Android changes.** Filter to relevant paths early — Android perf bugs rarely live in iOS or web changes: ```bash git diff .. -- 'android/' 'shared/' '*.kt' '*.java' '*.xml' \ > /tmp/regression-diff.patch ``` ## Workflow ### 1. Identify the good and bad refs Be precise: - Release tags: `release_8`, `release_9` - Commits: the last commit known to be good, the first commit known to be bad - Branches: `main` vs `feature/foo` If unsure which ref is "good," confirm by deploying it and checking the symptom is absent. A wrong baseline ref means a wrong scan. ### 2. Size up the change ```bash git diff --stat .. | tail -20 git log .. --oneline | wc -l ``` The `--stat` summary tells you which files moved most — high-churn files are the first place to look. The commit count is sanity: 50 commits is normal, 1,500 commits means you're investigating a release. ### 3. Capture the artifacts ```bash git diff .. > /tmp/regression-diff.patch git log .. --oneline > /tmp/regression-log.txt git diff --stat .. > /tmp/regression-stat.txt ``` For huge diffs, also produce focused subsets when you have a domain hint: ```bash # If the bug is in login flow: git diff .. -- 'app/src/**/login/**' '*/auth/**' > /tmp/regression-diff-auth.patch # If the bug is UI-only: git diff .. -- '*.kt' '*.xml' ':!**/test/**' > /tmp/regression-diff-ui.patch ``` ### 4. Write the bug brief The sub-agent's quality depends entirely on the bug description. Capture: - **Symptom** — what the user sees ("crash", "wrong color", "button doesn't respond") - **When it appears** — entry point, sequence of actions, conditions (offline, after rotation, on cold start) - **Evidence** — stack trace if any, log fragment, screenshot description - **What's the same** — what's *not* changed between good and bad (helps narrow) Save to `/tmp/regression-bug.md`. ### 5. Delegate to a Sonnet sub-agent Spawn the agent with `model: "sonnet"` and a self-contained prompt. The diff is the input — never read it in the main thread. > Read `/tmp/regression-diff.patch`, `/tmp/regression-log.txt`, and `/tmp/regression-bug.md`. > > The bug described in `regression-bug.md` was introduced somewhere in this diff. Identify the **top 3–5 most suspect changes** that could explain it. For each, return: > > - File and line range (`path/to/File.kt:120-145`) > - One-sentence reasoning tying the change to the bug symptom > - Confidence: high / medium / low > > Prefer changes that touch: the symptom's surface area (UI for visual bugs, network for connectivity bugs, etc.), feature-flag conditions, error-handling paths, and lifecycle hooks. Skip cosmetic refactors and dependency bumps unless they directly touch the affected code. > > Under 250 words total. ### 6. Investigate the surfaced areas This skill **finds the haystack, not the needle.** Take the top suspects and verify with instrumentation: - `android-probe-logging` — confirm the suspect code path runs and inspect values - `android-snapshot-diff` — confirm state actually changes in the suspect flow - `android-strictmode-probe` — if the bug smells like main-thread / leak If the top 5 suspects all check out clean, refine the bug brief (it probably needs more detail) or run a focused scan against a different file subset. ### 7. Cleanup gate ```bash rm /tmp/regression-diff*.patch /tmp/regression-log.txt /tmp/regression-stat.txt /tmp/regression-bug.md ``` No source touched, so the gate is light. But the patch files can be large — leaving them around bloats `/tmp` over an investigation session. ## Iteration patterns **Top suspect doesn't pan out.** Re-prompt the sub-agent with the exclusion: "I checked `path/to/Foo.kt:120-145` — it's not the cause. Re-rank the remaining suspects and add 2 new candidates." **Diff is too large for one pass.** Split by directory and run scans in parallel against subsets, then combine the rankings: ```bash git diff .. -- 'app/src/main/java/com/example/feature_a/**' > /tmp/regression-diff-a.patch git diff .. -- 'app/src/main/java/com/example/feature_b/**' > /tmp/regression-diff-b.patch ``` **No obvious suspects.** The bug may not be in the diff (env / config / data change) or the bug brief is too vague. Don't escalate to bisect — re-investigate the symptom first. ## Common mistakes | Mistake | Fix | |---------|-----| | Reading the diff inline | Always delegate to a Sonnet sub-agent — diffs are the entire input | | Letting the sub-agent default to Opus | Pass `model: "sonnet"` — diff scanning is text comprehension, not reasoning | | Vague bug brief ("it's broken") | Symptom + when + evidence + what's the same — quality of brief = quality of suspects | | Wrong baseline ref | Confirm the "good" ref actually doesn't have the symptom before scanning | | Falling back to bisect when one suspect doesn't pan out | Re-prompt the sub-agent excluding the dud; bisect is the *last* resort, not the second | | Skipping `--stat` | The stat tells you which files moved; high-churn files are first place to look | | Forgetting the commit log | `git log --oneline` gives the sub-agent commit message context — surprisingly useful | | Leaving `/tmp/regression-*` patch files | They can be huge (100MB+ for big releases); clean up between investigations |