---
name: inference-value-ledger
last_validated: 2026-06-05
description: >-
  Render the leadership value-prop ledger for the inference effort: the deployable
  wins vs the FlashInfer-TRTLLM + best-tuned-vLLM 0.21/0.22 baseline, grouped
  DONE / IN-PROGRESS / NOT-DONE / CLOSED-NEGATIVE, each row data-backed by a perf-lake
  campaign (live sol_rigor + verdict tier), plus the ranked GRIND FRONTIER of next levers
  (the always-be-grinding performance ratchet). Joins the curated
  perf-tune-report/configs/value-findings.yaml registry with the live campaigns via
  `perftunereport value_view`. Flags any finding whose backing campaign is missing or
  ungrounded. Use when you need to show value / report to leadership / answer "what have
  we uncovered that we can deploy, revalidate, or pursue". Triggers on "value ledger",
  "value prop", "show value", "show the value prop", "leadership view", "what wins do we
  have", "vs flashinfer", "value-findings", "what can we deploy", or any combination of
  "value / wins / findings / leadership / report" with "inference / vllm / flashinfer / perf".
---

# inference-value-ledger

One command to render the always-current leadership value-prop view.

## Run it

```bash
# print to stdout:
perftunereport value_view
# write a doc:
perftunereport value_view --out /path/to/VALUE-LEDGER.generated.md
# JSON envelope (finding + flag counts, e.g. for CI):
perftunereport value_view --json
```

`perftunereport` ships with the profile_and_optimize server. If it is not on PATH,
put the server on PYTHONPATH and invoke it as a module.

## How it works

- **Curated source-of-truth:** `perf-tune-report/configs/value-findings.yaml`
 - one entry per finding (`id`, `title`, `lifecycle`, `baseline`, `win`, `hardware`,
  `deploy_readiness`, `campaign_ids`, **`next_lever`** [+ optional `next_value`]). Edit this
  when a finding's status changes or a new finding lands. This is the ONLY hand-maintained part.
- **`perftunereport value_view`** joins it with the LIVE perf-lake campaigns: each campaign's
  `report_status.json` (`sol_rigor` / `dcgm_grounded`) + `verdict.json` (`tier` /
  `baseline_named`), and renders the grouped table **plus the ranked GRIND FRONTIER**. Win
  numbers are human-verified in the registry. The live columns + flags are read at render time
  so the ledger never silently drifts from the lake.
- **Flags surface** when a campaign is missing locally (S3-only / unpublished), ungrounded
  (`sol_rigor` none/L1), its baseline is not named, or **a finding has no `next_lever`** - fix
  by publishing/tagging the campaign or naming the next lever.

## Keep it honest

Lifecycle is evidence-state, not aspiration: a finding is `done` only when its campaign is
published + grounded, `in_progress` while revalidating, `not_done` for research,
`closed_negative` for banked dead-ends (value = prevented spend). Never mix B200 (TP=8) and
GB300 (TP=4) without the hardware column. Every row names its baseline (the
DRAFT-vs-VERDICT "fair baseline" rule). **Code-under-test provenance match:** a
finding's `source_refs` `delivery`+`commit` MUST match the cited `campaign_ids`' provenance -- citing
an `overlay`/offline-prepped campaign as the benefit of an `infr-patch` is a cross-tier DRAFT defect
(even when the kernels match). `value_view` flags this via `provenance_match_problems()` (delivery/
commit mismatch between `source_refs` and a cited campaign). Typical failure: citing an
overlay-deploy champion's number as a source patch's benefit -- always re-measure on the
patch's own deploy.

## Full-context reporting (no bare numbers)

Per the methodology canon "Every performance number carries its full context (no bare
numbers)" (`docs/METHODOLOGY.md`, "Full-context reporting"): every number this
skill emits (throughput, latency, TPOT/ITL, BW, %SoL, speedup, efficiency, goodput, acceptance
rate, scaling efficiency, thermal/failure rate - whatever it reports) MUST carry its full
measurement-context descriptor, and every comparison MUST be matched on it. A bare number is a
defect - it cannot set a default, ship a config, or appear in a report.
- **Identity:** model (+HF path), hardware (exact ceiling token `GB300`/`B200`), quant, kv-cache dtype.
- **Parallelism:** TP, DP (replicas), PP, EP, parallel_strategy.
- **Serving cfg:** max-num-seqs, max-num-batched-tokens, gpu-memory-utilization, max-model-len, cudagraph_mode/enforce_eager, async_scheduling, prefix-caching.
- **Workload:** dataset, ISL/OSL (or mean in/out tokens), concurrency, num-prompts.
- **Regime:** warm vs cold. Latency vs throughput tier.
- **Stack:** image/vllm commit, bench backend, serving engine.
- **Grounding:** `%SoL` (+ ceiling key from `configs/sol-ceilings.yaml` - never inline a peak), sol_rigor (L1-L4), trials n (mean±std), same-node, baseline named. (If the metric is not roofline-bound - e.g. accuracy/acceptance - omit `%SoL` but keep the rest of the descriptor.)
- **Per-number exact shape (no smoothing):** when reporting more than one number, keep EACH with its own exact shape (ISL/OSL, concurrency, dataset, regime) - never normalize a set to one uniform descriptor that hides per-point variation (e.g. `c=1 @ ISL1024/OSL256` + `c=64 @ ISL4096/OSL512`, NOT one shared "random").

## Asset validation (review + FAIL LOUD)

Every asset this skill emits (value-prop ledger / GRIND FRONTIER view / table) is held to
the asset-validation canon (`docs/METHODOLOGY.md`,
"Asset validation"): the generator **FAILS LOUDLY** on missing/bad data (a finding whose backing campaign is
missing or ungrounded, `unknown`/null where a value is required) -> flag it loudly, never silently
drop or fabricate a row, and the agent **REVIEWS** the ledger for human-sense + 100% accuracy
(every win row is campaign-grounded with live sol_rigor + verdict tier, no orphaned/ungrounded
claim presented as a win) and **rebuilds** it if wrong -- never ships a wrong/confusing ledger
with a caveat.

## PR value proposition (WHY + benefit + applicability)

This skill IS the value-prop ledger -- so when one of its findings graduates into a PR (a
downstream patch PR or any upstream/shareable PR), that PR MUST carry the same value framing,
not just the metric: the **WHY** (blocker removed / capability unlocked), the **measured benefit**
(full descriptor + `%SoL`/`sol_rigor` + a resolving perf-lake / evidence link. Never a hand-waved
"no perf penalty"), and an **applicability matrix** -- model architecture (sparse/MoE vs dense),
size, reasoning-vs-not, quant, parallelism (incl. any TP>1-specific behavior), workload, and
concurrency regime. State what it does NOT apply to. **Keep it tight (no AI-slop):** lead with
SoL%/roofline% and matched numbers. Cut hedging, redundancy, and decorative charts (infra PRs don't embed
images). De-slop checklist: no em-dashes (#1 AI-slop tell), minimal bold, plain punctuation, inline code
out of narrow table cells. Source: `docs/METHODOLOGY.md` "Value proposition" + "De-slop".

## Next lever / BREAKTHROUGH (Grind Mandate)

If this skill emits a measured result, its output MUST end by naming the **next perf lever**,
its **expected unlock** (direction + rough magnitude), and the **gate** that proves/refutes it,
per the Grind Mandate (`docs/METHODOLOGY.md`, "Always be grinding"). A
measured win is the new floor, not the finish -- so **do everything we can to find the next
BREAKTHROUGH**: the highest-EV unlock toward Speed-of-Light (a new champion / kernel / router /
quant / parallelism / spec-decode win, or an unblocked stack), not just the next micro-lever.
Rank the candidate breakthrough levers by value x cost (the GRIND FRONTIER, `perftunereport
value_view`), pursue the top, bank the rest with evidence. Record WHY a refuted lever loses,
update the standing frontier in the active bundle's `HANDOFF.md`. Never conclude
"exhausted/optimal/done" without an explicit next-lever frontier (an empty frontier AND a
documented SoL wall only). Delete this section ONLY if the skill produces no measurements.
