--- name: versify description: Stage 6b structural validator — validate and import the generated translation from Stage 6a, verify 1:1 alignment with Greek, and prepare for downstream stages. Does not generate translations or align existing witnesses. allowed-tools: Bash(python3:*), Bash(sqlite3:*), Bash(go:*), Bash(find:*), Bash(wc:*), Bash(ls:*), Bash(bash:*), Bash(git:*), Read, Write, Edit, Grep, Glob --- # Versify (Stage 6b Structural Validation) Validate and import the generated translation from Stage 6a. This skill is the **Stage 6b owner** — the second half of Stage 6. Its input is the generated translation from translation-synthesis (Stage 6a). Its job is: - Verify structural 1:1 alignment with Greek (should be true by construction) - Validate coverage and quality metrics - Import/prepare the verse-aligned edition for downstream use - Report any issues that need resolution before proceeding ## Important Policy **This skill has two modes:** 1. **Fast path** (`/versify try-witness`): Attempt to versify a `versification-candidate` witness directly. This is reference-based matching only — no DP alignment, no range alignment, no structural modification. The witness either maps 1:1 to Greek segments as-is, or it fails. 2. **Standard path** (`/versify run`): Validate and import the generated translation from Stage 6a. This is the original behavior. **This skill does NOT:** - Generate translations (that's Stage 6a — translation-synthesis) - Use DP alignment, range alignment, or force-fitting modes - Try to "fix" structural mismatches in witness translations - Attempt versification on witnesses NOT classified as `versification-candidate` The fast path is strictly opt-in via Stage 5 classification. It succeeds or fails cleanly — there is no partial-match fallback. ## Quick Status Verse-aligned editions: !`nix-shell -p sqlite --run "sqlite3 data/editions.db \"SELECT code FROM editions WHERE code LIKE '%-verse-%' OR urn LIKE '%generated%'\"" 2>/dev/null` Generated edition segments: !`nix-shell -p sqlite --run "sqlite3 data/editions.db \"SELECT e.code, COUNT(s.id) FROM editions e JOIN segments s ON s.edition_id=e.id WHERE e.urn LIKE '%generated%' GROUP BY e.code\"" 2>/dev/null` ## Commands - `/versify try-witness [work]` — Attempt to versify a `versification-candidate` witness directly (fast path) - `/versify run [work]` — Validate and import the generated translation from Stage 6a - `/versify validate [work]` — Re-validate an already-imported generated translation - `/versify status [work]` — Summarize validation state, coverage, and blockers Target: $ARGUMENTS --- ## Owned Responsibilities ### Owns - Stage 6b structural validation - 1:1 alignment verification (generated English ↔ Greek) - Coverage metrics and reporting - DB import of generated translations - Per-book isolation safeguards ### Does not own - Translation generation (Stage 6a — translation-synthesis) - Witness gathering and ranking (Stage 5 — translation-witness) - Interlinear generation (Stage 8) - Reader reliability verification (Stage 9) - Final ship promotion (Stage 10) --- ## Input Requirements Before running `/versify run`, these must be complete: 1. **Stage 6a complete**: Generated translation chapter files exist under `$LYCEUM_TEXTS_DIR//versification/` 2. **Chapter files have segments**: Each chapter file contains `segments` array with `reference` and `translation` fields 3. **Adversarial review done**: Chapter files include `reviews` array (even if all verdicts are "pass") If Stage 6a is incomplete, this skill should report the gap and direct to `/translation-synthesis run `. --- ## Workflows ## `/versify try-witness` Attempt to versify a `versification-candidate` witness directly, without generating a new translation. This is the **Stage 6 fast path**. The orchestrator calls this BEFORE `/translation-synthesis run`. If it succeeds, Stage 6a generation is skipped entirely. ### Prerequisites - Stage 5 complete: witness catalog exists with role classifications - Greek edition segments exist in the database - At least one witness is classified as `versification-candidate` ### Execution steps 1. **Load witness catalog** ```bash cat $LYCEUM_TEXTS_DIR//sources/english_candidates.json ``` Find all witnesses with `"versification-candidate"` in their `roles` array. 2. **For each candidate** (in catalog order): a. **Load witness text** from `witnesses/` b. **Parse references** using the witness `format` field (same import logic as Path A in translation-witness) c. **Match references to Greek segments**: - Direct match: witness reference equals Greek reference exactly - Parent-prefix match: witness reference is a parent of a Greek sub-segment (e.g., witness `1.1` matches Greek `1.1.1`, `1.1.2`, `1.1.3`) - For parent-prefix matches, the witness text is duplicated across sub-segments (acceptable for display, downstream interlinear generates fresh glosses regardless) d. **Compute metrics**: - `greek_segment_count`: total Greek segments - `witness_segment_count`: total parsed witness segments - `matched_segments`: Greek segments with a witness match - `coverage`: `matched_segments / greek_segment_count` - `empty_segments`: matched segments where witness text is empty/whitespace e. **Apply pass criteria**: - Coverage MUST be 100% (every Greek segment has a witness match) - No empty segments - Segment count ratio within 10% (`abs(witness - greek) / greek <= 0.10`) 3. **On success** (first candidate that passes): - Import the witness as the versified edition in the DB - Edition URN **must** use a distinct versified marker — pattern: `{base}-versified-eng1` (e.g., `perry-versified-eng1`). **Never** reuse the witness URN. - Set `generated_edition_urn` in manifest to the versified URN (keep `english_edition_urn` as the witness URN) - Set `edition_urn` in `english_versified.json` to match `generated_edition_urn` - Edition label indicates the source (e.g., "ASV (versified witness)") - Write versification report to `qa/alignment-report.md` - Return success with metrics 4. **On failure** (no candidate passes): - Write report documenting each candidate and why it failed - Return failure — the orchestrator will proceed to Stage 6a generation ### Output #### Success report ```markdown # Stage 6 Fast Path: Witness Versification ## Result: SUCCESS ## Witness used - ID: asv-1901 - Title: American Standard Version (1901) - Coverage: 100% (7957/7957 segments) - Empty segments: 0 - Import: versified-witness edition created ## Metrics - Greek segments: 7957 - Witness segments: 7956 (1 parent-prefix expansion) - Matched: 7957/7957 (100%) ## Next steps - Stage 6a (translation-synthesis) SKIPPED - Proceed to Stage 7 (transliteration) or Stage 8 (interlinear) ``` #### Failure report ```markdown # Stage 6 Fast Path: Witness Versification ## Result: FAILED — falling back to Stage 6a generation ## Candidates attempted ### asv-1901 - Coverage: 94.2% (7496/7957 segments) - Reason: coverage below 100% — 461 Greek segments unmatched - Missing references: 1.1.2, 1.1.3, ... (reference mismatch in sub-segments) ## Next steps - Proceed to /translation-synthesis run [work] ``` --- ## `/versify run` Validate and import the generated translation. ### Validation checks 1. **Structural alignment** - Every Greek segment has a corresponding generated English segment - No extra segments in the generated translation - References match exactly (same format, same numbering) 2. **Per-book isolation** - Segments from different chapters are not mixed - Import operates chapter-by-chapter 3. **Coverage metrics** - Total segment count matches Greek - No empty translations - No duplicate references 4. **Review status** - All chapters have been adversarially reviewed - High-severity issues documented ### Import steps ```bash # Import generated translation to DB python3 scripts/generate_translation.py --text --import-only ``` Or via direct SQL for custom workflows: ```sql -- Check if generated edition exists SELECT id, urn FROM editions WHERE urn LIKE '%generated%'; -- Verify segment count matches Greek SELECT (SELECT COUNT(*) FROM segments WHERE edition_id = ) as greek_count, (SELECT COUNT(*) FROM segments WHERE edition_id = ) as generated_count; ``` ### Output After successful validation and import: - Generated edition exists in DB with segments - Edition URN includes "generated" marker - Edition label indicates it's a generated translation - `qa/alignment-report.md` updated with validation results --- ## `/versify validate` Re-validate an already-imported generated translation without re-importing. ### When to use - After DB changes that might have affected segments - When verifying integrity before downstream work - As part of replay/rehab workflows ### Checks - DB segment count matches expected - No gaps or duplicates in reference sequence - Generated edition metadata is correct --- ## `/versify status` Report current state. ### Report includes - Generated translation existence (chapter files) - Import status (DB edition exists/missing) - Coverage: segments imported vs expected - Blockers: unresolved issues preventing downstream work - Next recommended action --- ## Legacy Scripts (Maintenance Only) These scripts exist for maintaining older texts that used legacy alignment modes. They are **not used for new work**. | Script | Text | Mode | Notes | |---|---|---|---| | `scripts/versify_meditations.py` | Meditations | DP | Legacy: explicit Book 1 map + DP | | `scripts/versify_homer.py` | Iliad/Odyssey | range | Legacy: range fallback | **Important**: Do not extend these scripts or use their patterns for new texts. All new texts go through translation-synthesis (Stage 6a) then versify (Stage 6b). **Deleted legacy scripts**: `versify_john.py` and `versify_aesop.py` were removed; existing data for these texts uses the fast path (Stage 6 witness versification). --- ## Validation Report Format The validation report should include: ```markdown # Stage 6b Validation Report ## Work - Slug: - Greek edition: - Generated edition URN: ## Coverage - Greek segments: N - Generated segments: N - Match: ✓ / ✗ ## Structural checks - [x] 1:1 reference alignment - [x] No cross-chapter contamination - [x] No empty translations - [x] No duplicate references ## Review summary - Chapters reviewed: N - Pass verdicts: N - Fail verdicts: N - Total issues logged: N ## Import status - Edition imported: ✓ / ✗ - Edition ID: ## Blockers - (none) or (list) ## Next steps - Ready for Stage 8 (interlinear) or Stage 9 (reader-reliability) ``` --- ## Important Rules ### Generated translation only This skill validates and imports **only** the generated translation from Stage 6a. It does not: - Create alignments from existing PD translations - Use DP/range/direct-mapping algorithms - Modify witness translations to fit Greek structure ### Witness versification is reference-matching only PD witness translations (gathered in Stage 5) remain structurally unmodified: - They are reference material for adversarial review - They may be displayed in the reader as alternative translations - They are **never** structurally modified or forced into 1:1 alignment via DP/range algorithms - A `versification-candidate` witness may be versified via `/versify try-witness`, but ONLY through direct reference matching — the witness either already maps 1:1, or the attempt fails ### Versified editions MUST use a distinct URN When the witness fast-path produces a versified edition, it **must** use a distinct URN that differs from the original witness URN. The pattern is `{base}-versified-eng1` (e.g., `perry-versified-eng1` not `perry-eng1`). This is critical because: - The original witness edition must remain importable with its own segment count and structure - Reusing the witness URN would cause the import to overwrite the witness with the versified edition, losing the original translation - The `generated_edition_urn` field in `manifest.json` must reflect the versified URN, while `english_edition_urn` stays as the witness URN - The `edition_urn` field in `english_versified.json` must match `generated_edition_urn` **Never** set `generated_edition_urn` equal to `english_edition_urn` when a versified edition is produced. ### ⚠️ DO NOT manually set edition URNs in manifest.json **CRITICAL**: Do NOT manually set `english_edition_urn` or `generated_edition_urn` in manifest.json. The import script (Stage 10) auto-generates these URNs with the "versified" convention that the reader requires: - The reader checks for `"versified"`, `"verse"`, or `"gen-eng"` in the English edition URN to enable row view - Manually overriding these fields breaks row view detection - Users will see broken/incomplete reader layout **What happens at Stage 10:** - Import script generates URNs like `.workspace-versified-eng1` - The "versified" marker activates row view in the reader - Both URNs are written automatically based on the versification output **If you manually set URNs:** - Row view will NOT activate - The validation in `scripts/import_workspace.go` will warn - The text will fail reader reliability checks (Stage 9) **Leave `english_edition_urn` and `generated_edition_urn` blank in manifest.json.** ### Row view requires versified translation Only versified (generated) translations may render in row view. This is because: - Row view requires 1:1 structural correspondence with Greek - Only the generated translation has this property by construction - Witness translations have arbitrary structure and cannot be row-aligned If a text lacks a versified generated translation, row view is unavailable for that text. ### Legacy grandfathering Pre-2026-03-16 texts with verse-aligned witness editions are grandfathered: - Meditations (Long, Haines) - John (Perseus, ASV) - Hymn 7 (Lucas) - Alexander (Perrin) These existing verse-aligned witnesses remain functional. New texts may get a versified witness edition ONLY through the fast path (`/versify try-witness`) when Stage 5 classifies a witness as `versification-candidate` and automated versification succeeds. All other new texts must go through Stage 6a generation. ### Coverage is 100% or blocked Because the generated translation is structure-preserving by construction: - Coverage should be 100% - Any coverage gap indicates a Stage 6a problem (missing chapters, generation failure) - Partial coverage is a blocker, not an acceptable state ### Biblical versification awareness When processing NT or OT texts: - **Ghost verses**: Different editions may include or exclude disputed verses (Matt 17:21, Mark 7:16, John 5:4, Acts 8:37, etc.). The pipeline follows the base Greek edition's numbering. If a verse is absent from the Greek edition, no segment is created for it. - **Sub-verse notation**: References like `3.16a` and `3.16b` are valid if the base edition uses them. - **Cross-tradition numbering**: LXX and MT psalm numbering differ. The pipeline uses whichever numbering the source Greek edition provides. Document the tradition in the workspace manifest. - **Verse divisions splitting clauses**: This is normal and expected — Estienne's 1551 verse divisions were pragmatic, not linguistic. The pipeline preserves these boundaries because they are the universal reference standard, even when linguistically awkward. --- ## Verification Contract This skill follows the Stage 6b contract. ### Verify - Generated translation exists (chapter files present) - 1:1 structural alignment verified - Coverage is 100% - No cross-book leakage - DB import successful ### Minimum evidence - `qa/alignment-report.md` with validation results - Generated edition in DB with correct segment count - No blockers documented ### Pass criteria - Every Greek segment has a generated English segment - Import completed without errors - Edition URN and label correctly indicate generated status - Ready for downstream stages ### Failure examples - Generated translation missing for some chapters - Reference mismatches between Greek and generated - Import failed or created incorrect segment count - Attempting to align a PD witness instead of generated translation ### Required next steps After successful Stage 6b: - Stage 8 (interlinear/morphology) can proceed - Stage 9 (reader-reliability) for product QA - Stage 10 (ship gate) when ready for final review --- ## Verification After completing this stage, run the automated verification script: ```bash bash scripts/verify_stage_6b.sh "${SLUG}" ``` Exit codes: 0=PASS (advance), 1=FAIL (block), 2=WARN (advance with notes). The orchestrator runs this automatically; when executing manually, check the output for [FAIL] or [WARN] lines. --- ## Key Files | File | Purpose | |---|---| | `scripts/generate_translation.py` | Stage 6a generation + optional import | | `$LYCEUM_TEXTS_DIR//versification/` | Generated translation chapter files | | `qa/alignment-report.md` | Validation report | | `.pi/skills/translation-synthesis/SKILL.md` | Stage 6a (generation) | | `.pi/skills/translation-witness/SKILL.md` | Stage 5 (witness collection) | | `docs/text-pipeline-master-plan-2026-03-13.md` | Pipeline architecture | ### Legacy scripts (maintenance only) | File | Purpose | |---|---| | `scripts/versify_meditations.py` | Legacy Meditations DP alignment | | `scripts/versify_homer.py` | Legacy Homer range fallback | | *(deleted)* | Legacy batch helper |