---
name: task-quality-kpi
description: "Objective task quality evaluation framework using quantitative KPIs. KPIs are automatically calculated by a hook when task files are modified and saved to TASK-XXX--kpi.json. Use when: reading KPI data for task evaluation, understanding quality metrics, deciding whether to iterate or approve based on data."
allowed-tools: Read, Write
---

# Task Quality KPI Framework

## Overview

The **Task Quality KPI Framework** provides **objective, quantitative metrics** for evaluating task implementation quality. 

**Key Architecture**: KPIs are **auto-generated by a hook** - you read the results, not run scripts.

```
┌─────────────────────────────────────────────────────────────┐
│  HOOK (auto-executes)                                       │
│  Trigger: PostToolUse on TASK-*.md                          │
│  Script: task-kpi-analyzer.py                               │
│  Output: TASK-XXX--kpi.json                                 │
├─────────────────────────────────────────────────────────────┤
│  SKILL / AGENT (reads output)                               │
│  Input: TASK-XXX--kpi.json                                  │
│  Action: Make evaluation decisions                          │
└─────────────────────────────────────────────────────────────┘
```

### Why This Architecture?

| Problem | Solution |
|---------|----------|
| Skills can't execute scripts | Hook auto-runs on file save |
| Subjective review_status | Quantitative 0-10 scores |
| "Looks good to me" | Evidence-based evaluation |
| Binary pass/fail | Graduated quality levels |

## KPI File Location

After any task file modification, find KPI data at:

```
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
```

## KPI Categories

```
┌─────────────────────────────────────────────────────────────┐
│                    OVERALL SCORE (0-10)                     │
├─────────────────────────────────────────────────────────────┤
│  Spec Compliance (30%)                                      │
│  ├── Acceptance Criteria Met (0-10)                         │
│  ├── Requirements Coverage (0-10)                           │
│  └── No Scope Creep (0-10)                                  │
├─────────────────────────────────────────────────────────────┤
│  Code Quality (25%)                                         │
│  ├── Static Analysis (0-10)                                 │
│  ├── Complexity (0-10)                                      │
│  └── Patterns Alignment (0-10)                              │
├─────────────────────────────────────────────────────────────┤
│  Test Coverage (25%)                                        │
│  ├── Unit Tests Present (0-10)                              │
│  ├── Test/Code Ratio (0-10)                                 │
│  └── Coverage Percentage (0-10)                             │
├─────────────────────────────────────────────────────────────┤
│  Contract Fulfillment (20%)                                 │
│  ├── Provides Verified (0-10)                               │
│  └── Expects Satisfied (0-10)                               │
└─────────────────────────────────────────────────────────────┘
```

### Category Weights

| Category | Weight | Why |
|----------|--------|-----|
| Spec Compliance | 30% | Most important - did we build what was asked? |
| Code Quality | 25% | Technical excellence |
| Test Coverage | 25% | Verification and confidence |
| Contract Fulfillment | 20% | Integration with other tasks |

## When to Use

- Reading KPI data for task quality evaluation
- Understanding quality metrics and scoring breakdown
- Deciding whether to iterate or approve based on quantitative data
- Integrating KPI checks into automated loops (`agents_loop.py`)
- Generating evidence-based evaluation reports

## Instructions

### 1. Reading KPI Data (Primary Use)

**DO NOT run scripts** - read the auto-generated file:

```markdown
Read the KPI file:
  docs/specs/001-feature/tasks/TASK-001--kpi.json
```

### 2. Understanding the Data

The KPI file contains:

```json
{
  "task_id": "TASK-001",
  "evaluated_at": "2026-01-15T10:30:00Z",
  "overall_score": 8.2,
  "passed_threshold": true,
  "threshold": 7.5,
  "kpi_scores": [
    {
      "category": "Spec Compliance",
      "weight": 30,
      "score": 8.5,
      "weighted_score": 2.55,
      "metrics": {
        "acceptance_criteria_met": 9.0,
        "requirements_coverage": 8.0,
        "no_scope_creep": 8.5
      },
      "evidence": [
        "Acceptance criteria: 9/10 checked",
        "Requirements coverage: 8/10"
      ]
    }
  ],
  "recommendations": [
    "Code Quality: Moderate improvements possible"
  ],
  "summary": "Score: 8.2/10 - PASSED"
}
```

### 3. Making Decisions

Use `overall_score` and `passed_threshold`:

```
IF passed_threshold == true:
  → Task meets quality standards
  → Approve and proceed

IF passed_threshold == false:
  → Task needs improvement
  → Check recommendations for specific targets
  → Create fix specification
```

## Integration with Workflow

### In Task Review (evaluator-agent)

```markdown
## Review Process

1. Read KPI file: TASK-XXX--kpi.json
2. Extract overall_score and kpi_scores
3. Read task file to validate
4. Generate evaluation report
5. Decision based on passed_threshold
```

### In agents_loop

```python
# Check KPI file exists
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"

if kpi_path.exists():
    kpi_data = json.loads(kpi_path.read_text())
    
    if kpi_data["passed_threshold"]:
        # Quality threshold met
        advance_state("update_done")
    else:
        # Need more work
        fix_targets = kpi_data["recommendations"]
        create_fix_task(fix_targets)
        advance_state("fix")
else:
    # KPI not generated yet - task may not be implemented
    log_warning("No KPI data found")
```

### Multi-Iteration Loop

Instead of max 3 retries, iterate until quality threshold met:

```
Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions  
Iteration 3: Score 7.8 → PASSED → Proceed
```

Each iteration updates the KPI file automatically on task save.

## Threshold Guidelines

| Score | Quality Level | Action |
|-------|---------------|--------|
| 9.0-10.0 | Exceptional | Approve, document best practices |
| 8.0-8.9 | Good | Approve with minor notes |
| 7.0-7.9 | Acceptable | Approve (if threshold 7.5) |
| 6.0-6.9 | Below Standard | Request specific improvements |
| < 6.0 | Poor | Significant rework required |

### Recommended Thresholds

| Project Type | Threshold | Rationale |
|--------------|-----------|-----------|
| Production MVP | 8.0 | High quality required |
| Internal Tool | 7.0 | Good enough |
| Prototype | 6.0 | Functional over perfect |
| Critical System | 8.5 | No compromises |

## Metric Details

### Spec Compliance Metrics

**Acceptance Criteria Met**
- Calculates: `(checked_criteria / total_criteria) * 10`
- Source: Task file checkbox count
- Example: 9/10 checked = 9.0

**Requirements Coverage**
- Calculates: Count of REQ-IDs this task covers
- Source: `traceability-matrix.md`
- Example: 4 requirements covered = 8.0

**No Scope Creep**
- Calculates: `(implemented_files / expected_files) * 10`
- Source: Task "Files to Create" vs actual files
- Penalizes: Missing files or unexpected additions

### Code Quality Metrics

**Static Analysis**
- Java: Maven Checkstyle
- TypeScript: ESLint
- Python: ruff
- Score: 10 if passes, 5 if issues found

**Complexity**
- Calculates: Functions >50 lines
- Score: `10 - (long_functions_ratio * 5)`
- Penalizes: Large, complex functions

**Patterns Alignment**
- Checks: Knowledge Graph patterns
- Source: `knowledge-graph.json`
- Validates: Implementation follows project patterns

### Test Coverage Metrics

**Unit Tests Present**
- Calculates: `min(10, test_files * 5)`
- 2 test files = maximum score
- Penalizes: Missing tests

**Test/Code Ratio**
- Calculates: `(test_count / code_count) * 10`
- 1:1 ratio = 10/10
- Ideal: At least 1 test file per code file

**Coverage Percentage**
- Source: Coverage reports (JaCoCo, lcov, etc.)
- Calculates: `coverage_percent / 10`
- 80% coverage = 8.0

### Contract Fulfillment Metrics

**Provides Verified**
- Checks: Files exist and export expected symbols
- Source: Task `provides` frontmatter
- Validates: Contract satisfied

**Expects Satisfied**
- Checks: Dependencies provide required files/symbols
- Source: Task `expects` frontmatter
- Validates: Prerequisites met

## When KPI File is Missing

If `TASK-XXX--kpi.json` doesn't exist:

1. **Task was never modified** - Hook runs on file save
2. **Hook failed** - Check Claude Code logs
3. **Task is new** - Save the file first to trigger hook

**DO NOT** try to calculate KPIs manually. The hook runs automatically when:
- Task file is saved (Write tool)
- Task file is edited (Edit tool)

## Best Practices

### 1. Always Check KPI File Exists

Before evaluating:
```markdown
Check if KPI file exists:
  docs/specs/[ID]/tasks/TASK-XXX--kpi.json

If missing:
  - Task may not be implemented yet
  - Ask user to save the task file first
```

### 2. Trust the Metrics

The KPIs are objective. Only override with documented evidence:
- Critical security issue not in metrics
- Logic error not caught by static analysis
- Exceptional quality not measured

### 3. Iterate on Low KPIs

Target specific categories:

```
❌ "Fix code quality issues"
✅ "Improve Code Quality KPI from 5.2 to 7.0:
    - Complexity: Refactor processData() (5→8)
    - Patterns: Add error handling (6→8)"
```

### 4. Track KPI Trends

Monitor quality over time:

```
Sprint 1: Average KPI 6.8
Sprint 2: Average KPI 7.3 (+0.5)
Sprint 3: Average KPI 7.9 (+0.6)
```

## Troubleshooting

### KPI File Not Generated

**Check:**
1. Hook enabled in `hooks.json`
2. Task file name matches pattern `TASK-*.md`
3. File was actually saved (not just viewed)

### KPI Scores Seem Wrong

**Validate:**
1. Check evidence field for data sources
2. Verify files exist at expected paths
3. Some metrics need build tools (Maven, npm)

### Low Scores Despite Good Code

**Possible causes:**
- Missing test files
- No coverage report generated
- Acceptance criteria not checked
- Lint rules too strict

Fix the root cause, not just the score.

## Examples

### Example 1: Reading KPI Data

```markdown
Read the KPI file to evaluate task quality:
  docs/specs/001-feature/tasks/TASK-042--kpi.json

Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests

Decision: REQUEST FIXES - target Test Coverage improvement
```

### Example 2: Iteration Decision

```markdown
Iteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗

Fix targets:
1. Refactor complex functions (Code Quality)
2. Add test coverage (Test Coverage)

Iteration 2 KPI: Score 7.8 → PASSED ✓
```

### Example 3: agents_loop Integration

```python
# In agents_loop, after implementation step
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"

if kpi_file.exists():
    kpi = json.loads(kpi_file.read_text())
    
    if kpi["passed_threshold"]:
        print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
        advance_state("update_done")
    else:
        print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
        print("Recommendations:")
        for rec in kpi["recommendations"]:
            print(f"  - {rec}")
        advance_state("fix")
```

## References

- `evaluator-agent.md` - Agent that uses KPI data for evaluation
- `hooks.json` - Hook configuration for auto-generation
- `task-kpi-analyzer.py` - Hook script (do not execute directly)
- `agents_loop.py` - Orchestrator that reads KPI for decisions