--- name: code-review-simplify description: Use when reviewing code changes for quality, security, performance, or maintainability issues and identifying simplification opportunities in a single token-efficient pass argument-hint: "[--tier lightweight|standard|full] [--files ]" --- > Unified code review + simplification. Single-pass, multi-dimensional scoring, adaptive depth. Integrates with evolve-loop auditor and builder phases. ## Contents - [Architecture](#architecture) — hybrid pipeline+agentic model - [Single-Pass Flow](#single-pass-flow) — read diff once, analyze both dimensions - [Multi-Dimensional Scoring](#multi-dimensional-scoring) — 4 dimensions with numeric scores - [Adaptive Depth Routing](#adaptive-depth-routing) — scale analysis with diff complexity - [Integration Hooks](#integration-hooks) — evolve-loop auditor and builder wiring - [Simplification Catalog](#simplification-catalog) — what to simplify and when - [Output Schema](#output-schema) — structured review+simplify report ## Architecture Hybrid pipeline+agentic model. Structured passes handle known patterns cheaply; agentic reasoning handles contextual issues that require understanding intent. ``` Input: git diff (changed files) │ ▼ ┌─────────────────────────┐ │ PIPELINE LAYER (fast) │ Deterministic pattern checks │ ───────────────────── │ │ 1. Complexity scan │ Cognitive complexity, nesting depth │ 2. Smell detection │ 22-smell catalog from detect-code-smells │ 3. Security scan │ OWASP patterns, secrets, injection │ 4. Style check │ Naming, file size, function length │ 5. Duplication check │ Near-duplicate code blocks └─────────┬───────────────┘ │ Structured findings ▼ ┌─────────────────────────┐ │ AGENTIC LAYER (deep) │ LLM-powered contextual analysis │ ───────────────────── │ │ 6. Logic correctness │ Edge cases, off-by-one, null handling │ 7. Intent alignment │ Does change match acceptance criteria? │ 8. Cross-file impact │ Dependency effects, API contract breaks │ 9. Simplification │ Extract Method, reduce nesting, dedup │ 10. Compound risk │ Future maintenance cost assessment └─────────┬───────────────┘ │ Contextual findings ▼ ┌─────────────────────────┐ │ SCORING & REPORT │ Aggregate into multi-dimensional scores │ ───────────────────── │ │ Composite score 0.0-1.0│ │ Per-dimension scores │ │ Simplification actions │ │ Priority-ranked issues │ └─────────────────────────┘ ``` **Why hybrid?** Pipeline catches 60-70% of issues at ~5% of the token cost. Agentic layer focuses expensive reasoning on the 30-40% that requires context. Research: Cursor BugBot's biggest quality leap was pipeline→agentic; Anthropic's review dispatches parallel specialist agents. **Token budget:** Pipeline layer ~2-5K tokens. Agentic layer ~15-40K tokens (scales with diff). Total: ~20-45K per review pass. ## Single-Pass Flow Read the diff once. Run both review and simplification analysis on the same context. This saves ~40-50% tokens vs. invoking separate review and simplify agents. ### Step 1: LOAD (once) ```bash DIFF=$(git diff HEAD~1 --stat) DIFF_LINES=$(git diff HEAD~1 --numstat | awk '{s+=$1+$2} END {print s}') CHANGED_FILES=$(git diff HEAD~1 --name-only) ``` ### Step 2: PIPELINE (structured checks) Run deterministic checks on each changed file: | Check | Tool | Threshold | Finding Type | |-------|------|-----------|-------------| | Cognitive complexity | `scripts/complexity-check.sh` | > 15 per function | `complexity` | | Nesting depth | grep-based | > 4 levels | `complexity` | | Function length | line count | > 50 lines | `maintainability` | | File length | line count | > 800 lines | `maintainability` | | Near-duplicates | content hash | > 6 similar lines | `maintainability` | | Hardcoded secrets | pattern match | any match | `security` | | Injection vectors | pattern match | any match | `security` | | Naming conventions | project config | deviation | `style` | ### Step 3: AGENTIC (contextual analysis) LLM analyzes the diff with pipeline findings as context: | Analysis | What to Check | Score Impact | |----------|---------------|-------------| | Logic correctness | Edge cases, boundary conditions, null/undefined, off-by-one | correctness score | | Intent alignment | Acceptance criteria match, no scope creep, no missing requirements | correctness score | | Cross-file impact | Breaking API changes, dependency effects, import correctness | correctness + security | | Simplification opportunities | Extract Method candidates, reducible nesting, inlineable abstractions | maintainability score | | Performance concerns | N+1 queries, missing indexes, unnecessary allocations, blocking calls | performance score | | Security review | Auth checks, input validation, error info leakage | security score | ### Step 4: SCORE & REPORT Aggregate findings into the multi-dimensional scoring output (see below). ## Multi-Dimensional Scoring Four dimensions, each scored 0.0 to 1.0. Replaces binary PASS/FAIL with actionable numeric scores. | Dimension | Weight | What It Measures | Score Guide | |-----------|--------|-----------------|-------------| | **correctness** | 0.35 | Logic errors, edge cases, intent alignment, test coverage | 1.0 = no issues; 0.7 = minor edge case; 0.3 = logic bug; 0.0 = critical flaw | | **security** | 0.25 | Injection, auth, secrets, input validation, error leakage | 1.0 = hardened; 0.7 = minor gap; 0.3 = exploitable; 0.0 = critical vuln | | **performance** | 0.15 | Complexity, N+1, blocking, memory, unnecessary work | 1.0 = optimal; 0.7 = acceptable; 0.3 = slow path; 0.0 = denial-of-service risk | | **maintainability** | 0.25 | Readability, complexity, duplication, naming, file size | 1.0 = clean; 0.7 = minor smell; 0.3 = high cognitive load; 0.0 = unmaintainable | **Composite score:** `composite = 0.35*correctness + 0.25*security + 0.15*performance + 0.25*maintainability` **Verdict mapping:** | Composite | Verdict | Action | |-----------|---------|--------| | >= 0.8 | PASS | Ship immediately | | 0.6 - 0.79 | WARN | Ship with noted issues; simplification recommended | | < 0.6 | FAIL | Block shipping; fix required | **Simplification trigger:** If `maintainability < 0.7`, auto-generate simplification suggestions (see Simplification Catalog). **Confidence:** Each dimension includes a `confidence` (0.0-1.0). If any dimension's confidence < 0.7, escalate to WARN regardless of score. ## Adaptive Depth Routing Scale analysis intensity with diff complexity. Small changes get lightweight review; large changes get full multi-agent analysis. | Tier | Trigger | Pipeline | Agentic | Token Budget | |------|---------|----------|---------|-------------| | **Lightweight** | < 50 changed lines, 1-3 files | Full pipeline checks | Single-pass agentic (no specialists) | ~10-15K | | **Standard** | 50-200 lines, 3-10 files | Full pipeline checks | Full agentic analysis (all 5 checks) | ~20-35K | | **Full Review** | > 200 lines, 10+ files, or security-sensitive | Full pipeline checks | Multi-agent specialist panel: correctness + security + performance agents | ~40-80K | **Security-sensitive detection:** Files matching these patterns auto-escalate to full review: - `auth*`, `*login*`, `*password*`, `*token*`, `*secret*` - `*payment*`, `*billing*`, `*checkout*` - `*eval*`, `*grader*`, `*.evolve/evals/*` - Any file in `agents/`, `skills/*/SKILL.md` **Risk-based routing:** Files with high churn (> 5 commits in last 10 cycles) or high fan-in (imported by > 5 other files) get escalated one tier. ## Integration Hooks ### Evolve-Loop Auditor Integration The auditor invokes this skill as an optional enhancement to its review pass: ``` Auditor Standard Flow: 1. Read build-report.md 2. Run code quality checks ← ENHANCED by pipeline layer 3. Run security checks ← ENHANCED by security scan 4. Run hallucination detection (unchanged) 5. Run pipeline integrity checks (unchanged) 6. Run eval verification (unchanged) 7. Generate verdict ← ENHANCED by multi-dimensional scoring ``` **Auditor invocation:** When the auditor encounters code changes (not doc-only or config-only), it can invoke this skill's structured checks to supplement its review. The skill's composite score feeds into the auditor's verdict logic. **Configuration in evolve-auditor.md:** ```markdown ### Optional Skill Consultation - **code-review-simplify**: For code changes, invoke `skills/code-review-simplify/SKILL.md` pipeline layer. Use composite score to supplement verdict. If maintainability < 0.7, append simplification suggestions to audit-report.md. ``` ### Evolve-Loop Builder Integration The builder can invoke this skill post-implementation for self-review: ``` Builder Self-Review (after implementation, before reporting): 1. Run eval graders (existing) 2. Run code-review-simplify lightweight tier 3. If maintainability < 0.7: apply simplification suggestions before reporting 4. Include self-review score in build-report.md ``` **Builder invocation:** After implementing a task and before writing `build-report.md`, the builder runs this skill's lightweight tier on its own changes. Simplification suggestions with `maintainability < 0.7` are applied inline. This catches issues before the auditor sees them, reducing audit-fix cycles. ### Standalone Invocation The skill can be invoked directly outside the evolve-loop: ```bash # Review + simplify changed files /code-review-simplify [--tier lightweight|standard|full] [--files ] ``` ## Simplification Catalog When `maintainability < 0.7`, generate simplification suggestions from this catalog. Prioritize by impact and confidence. | Category | Technique | When to Apply | Confidence | |----------|----------|---------------|-----------| | **Complexity** | Extract Method | Function > 30 lines or complexity > 10 | High (0.9) | | **Complexity** | Flatten nesting | Nesting > 3 levels, guard clauses applicable | High (0.9) | | **Complexity** | Decompose conditional | Complex boolean expressions (> 3 operators) | High (0.85) | | **Duplication** | Extract shared utility | Near-duplicate blocks (> 6 lines, > 80% similar) | Medium (0.75) | | **Readability** | Rename for clarity | Ambiguous names (< 3 chars, generic like `data`/`temp`) | Medium (0.7) | | **Readability** | Replace magic numbers | Hardcoded literals in logic branches | High (0.85) | | **Abstraction** | Inline over-abstraction | Single-use wrapper with no added value | Medium (0.7) | | **Abstraction** | Remove dead code | Unreachable branches, unused imports/vars | High (0.9) | **Constraints:** - Only suggest localized refactorings (same file or module). LLMs are weak at cross-module architectural refactoring. - Max 5 simplification suggestions per review. Focus on highest-impact. - Each suggestion must include before/after code snippets and estimated complexity reduction. - Never suggest simplification that changes external behavior (pure refactoring only). ## Output Schema The skill produces a structured report: ```markdown # Code Review + Simplify Report ## Summary - **Tier:** lightweight | standard | full review - **Changed:** X files, Y lines - **Composite Score:** 0.XX - **Verdict:** PASS | WARN | FAIL ## Dimension Scores | Dimension | Score | Confidence | Key Finding | |-----------|-------|------------|-------------| | correctness | 0.X | 0.X |

| | security | 0.X | 0.X |

| | performance | 0.X | 0.X |

| | maintainability | 0.X | 0.X |

| ## Issues (priority-ranked) | # | Severity | Dimension | File:Line | Description | Suggestion | |---|----------|-----------|-----------|-------------|-----------| ## Simplification Suggestions | # | Technique | File:Line | Before (snippet) | After (snippet) | Impact | |---|----------|-----------|-------------------|-----------------|--------| ## Pipeline Findings ## Agentic Findings ``` **JSON variant** (for programmatic consumption by auditor/builder): ```json { "tier": "lightweight|standard|full", "composite": 0.0, "verdict": "PASS|WARN|FAIL", "dimensions": { "correctness": {"score": 0.0, "confidence": 0.0, "findings": []}, "security": {"score": 0.0, "confidence": 0.0, "findings": []}, "performance": {"score": 0.0, "confidence": 0.0, "findings": []}, "maintainability": {"score": 0.0, "confidence": 0.0, "findings": []} }, "simplifications": [], "issueCount": {"critical": 0, "high": 0, "medium": 0, "low": 0} } ```