` is GraphQL-hydrated AFTER networkidle, and the page virtualizes (unmounts scrolled-off articles). XTweet uses `waitForSelector('article[data-testid="tweet"]', …)` instead, plus snapshot-collect across scrolls keyed by tweetId so the main tweet survives virtualization. Thread + media (tweet attach + link card + video poster) are returned as markdown `![](url)` so rich-html renders them as `` and Obsidian / VS Code preview shows them. Three explicit preset rules — `x-tweet-status` (3-seg path), `x-search-live` (`/search`), `x-profile` (1-seg) — route through this flavor; everything else x.com routes via fallback (also XTweet by host sniff). - **Default `--min-bytes` is 500 — short single tweets need `--min-bytes 200`**. A typical single tweet returns ~400-500 bytes (text + author + metrics); multi-article threads naturally exceed 500. If a known-good single X URL gets `SMELL_REJECTED reject_reason=empty_content`, pass `--min-bytes 200`. Future spec may auto-adjust per `x-tweet-status` kind. ## Mandatory Tail (MANDATORY — `finish` is preferred) `ascent-research loop` does **NOT** render `report.html`. Before you declare the task done, prefer the single completion protocol: ```bash ascent-research finish [--bilingual] [--open] ``` `finish` runs `coverage -> synthesize -> audit` and returns non-zero unless all three stages pass. If you need to debug a stage, run the inspection commands individually: ```bash ascent-research coverage ascent-research synthesize [--bilingual] [--open] ascent-research --json audit ``` Rules: - Always run `finish` after `loop` (or after manual `add` / `write` work) before replying. - If the user asks for Chinese, bilingual output, or 中英文 output, you MUST run `finish --bilingual`; plain `finish` renders English-only HTML. - If `finish` fails at `coverage`, do NOT claim the report is complete. Surface the blockers from `report_ready_blockers` and keep working or ask the user what to relax. - If `finish` fails at `synthesize`, do NOT claim `report.html` exists. Surface the synthesize error and keep working or ask the user what to relax. - If `finish` fails at `audit`, do NOT claim the session is验收-complete. Surface `audit_blockers` and keep working or ask the user what to relax. - If `--bilingual` reports `bilingual_skipped`, do NOT claim Chinese output is complete. Fix the provider and rerun `finish --bilingual`. - In the final reply, include the exact `/report.html` path and `audit_status`. Chinese/bilingual generation: ```bash # Default translator is Claude when built with provider-claude. ascent-research finish --bilingual --open # If Claude auth/API key is unavailable but the binary was built with provider-codex: ASR_BILINGUAL_PROVIDER=codex ascent-research finish --bilingual --open ``` `--bilingual` produces an English/中文 toggle in `report.html` by injecting `

` siblings. Without `--bilingual`, the report intentionally has no Chinese paragraphs. ## Dynamic Fact Topics Require Fact Check For live, sports, news, current roster, current price, latest version, market, legal, medical, financial, or other time-sensitive factual research, create the session with `--tag fact-check`. ```bash ascent-research new "" --slug --preset tech --tag fact-check ``` For sports/current-roster tasks, prefer the sports preset and seed at least one roster source URL from an official roster page, Basketball-Reference team season page, or ESPN roster page before synthesis: ```bash ascent-research new "" --slug --preset sports --tag fact-check ascent-research add "https://www.nba.com//roster" --slug ascent-research add "https://www.basketball-reference.com/teams//.html" --slug ``` When `--tag fact-check` is present, the loop must emit `fact_check` actions for concrete person, team, date, number, price, roster, standing, release version, or current-status claims before the final report depends on them. If evidence is stale or conflicting, emit `fact_check` with `outcome:"uncertain"` and continue fetching instead of writing a confident assertion. Before final synthesis on these sessions, run: ```bash ascent-research coverage --json ``` Verify `fact_check_required=true` and `fact_checks_total >= 1`. If `report_ready_blockers` contains `fact_checks_total` or `fact_check_invalid_sources`, do NOT run `synthesize`; continue fetching accepted sources and emitting `fact_check` actions. ## GitHub Trust / Fake-Star Audit For “fake GitHub stars”, repository trust, suspicious star growth, or investor/developer due diligence tasks, start with the deterministic audit hand before asking the LLM to interpret anything. Do not ask the model to decide whether stars are fake from vibes. ```bash ascent-research github-audit / --depth timeline --sample 500 --out audit.json --html audit.html ascent-research new "/ GitHub trust audit" --slug --trust --preset github-trust --tag fact-check ascent-research add-local audit.json --slug --trust ascent-research loop --trust --provider codex --iterations 8 ascent-research finish --trust --open ``` Rules: - `github-audit` outputs trust score, risk score, band, confidence, reasons, and evidence; never call a repo “fake” or “real” deterministically. - If the user asks whether a repo looks trustworthy, show `audit.html` first. It is the deterministic scorecard: trust score, risk score, confidence, metric dashboard, reasons, and evidence gaps. - Use `--depth repo` for anonymous quick checks; use `stargazers` or `timeline` only when postagent can resolve `$POSTAGENT.GITHUB.TOKEN`. - The audit JSON is the evidence artifact. Add it with `add-local` before `loop` so the report cites the deterministic signals instead of re-deriving them. - `--preset github-trust` is for contextual follow-up sources; it does not replace `github-audit` scoring. - Finish still goes through the normal completion protocol: `coverage -> synthesize -> audit`. ## Mental Model ``` One research project = one session under ~/.actionbook/ascent-research// session.toml metadata (topic, preset, tags, parent slug) SCHEMA.md user-editable loop guidance session.md narrative — Overview / 01·WHY / 02·HOW / ... report spine session.jsonl append-only event log — authoritative machine state raw/ every accepted source, one file wiki/.md persistent entity + concept + analysis pages diagrams/*.svg hand-drawn figures, inlined in the HTML report report.html rendered editorial output with wiki TOC + bilingual toggle ``` **Three-layer control flow:** ``` LLM orchestrator (this skill / active-research / custom agent) | [CLI ONLY — observability > terseness] v research CLI ──> route presets ──> postagent (HTTP) ──> actionbook browser (CDP) ──> local file reader (file://) | [infra-enforced smell test on every fetched body] v session.md + session.jsonl + wiki/ + diagrams/ + report.html ``` Everything downstream of the CLI is stateless between turns — the agent addresses state only by `--slug`. The CLI holds the event log and the preset registry. ## Command Surface (all of it) ### Session lifecycle ``` ascent-research new --slug [--preset tech] [--tag t]... [--from ] [--force] ascent-research list [--tag t] [--tree] ascent-research show ascent-research status [] ascent-research resume ascent-research close [] ascent-research rm [--force] ``` - `new` seeds `SCHEMA.md` with a starter template and sets the session active. - `--from ` forks a session — parent's Overview becomes the new Context, tags are inherited. Wiki does NOT auto-fork (by design). - `--force` on `new` overwrites an existing slug; on `rm` skips the dry-run confirmation. - `list --tree` renders parent→child hierarchy as ASCII. ### Ingest — online ``` ascent-research add [--slug ] [--timeout ] [--readable | --no-readable] [--min-bytes N] [--on-short-body {reject|warn}] [--frame-id ] [--run-code-args ] [--reseed] [--actionbook-backend {v2-mcp|v1-cli}] ascent-research batch ... [--slug ] [--concurrency 1..16] [--timeout ] [--readable | --no-readable] [--frame-id ] [--run-code-args ] [--reseed] [--actionbook-backend {v2-mcp|v1-cli}] ascent-research sources [] [--rejected] ascent-research route [--rules ] [--preset ] [--prefer browser] ``` - `add` routes via preset (`tech.toml` default) — HN/arXiv/GitHub hit postagent directly, other hosts fall through to actionbook browser. - Default per-source timeout is **90 s** since v0.4.0 (V2 server inner run-code default is 60 s; the extra 30 s covers edge + transport overhead). Override with `--timeout `. - `--frame-id` targets a specific iframe for V2 `run-code`. `--run-code-args` injects structured JSON arguments into the script. - `--reseed` re-probes the V2 catalog even when a wiki entry already exists (the default pre-fetch step skips already-seeded sites). - `--actionbook-backend v1-cli` forces the legacy subprocess path for this single call (useful when the V2 extension is offline). Otherwise the binary-wide default (env `ACTIONBOOK_BACKEND`, default `v2-mcp`) applies. - `batch` fetches in parallel workers; each call runs the smell test independently. - `route` prints the decision without fetching — useful for debugging preset rules. - Smell test fails → `SMELL_REJECTED` with a reason (`too_short`, `wrong_url`, `browser_chrome_error`, etc.). The URL attempt is always logged in jsonl. - A composite route rule (one rule, N parts merged under `composite-v1` schema) is a single source; if any part rejects, the source is rejected and `composite_failed_part` in the jsonl event names the failing label. ### Ingest — local (v3) ``` ascent-research add-local [--slug ] [--glob '...']... [--max-file-bytes N] [--max-total-bytes N] [--original-url URL] [--origin-tool TOOL] [--origin-note TEXT] ``` - `` can be `file://abs/path`, `/abs/path`, `./rel/path`, `~/rel/path`, or a bare path. - `--glob` is repeatable; prefix `!` excludes. Default glob matches all files. - Caps enforced at walk time: default 256 KB per file, 2 MB per walk. Direct `add file:///…` calls get an 8 MB fetch-stage backstop. - Binary files (null-byte probe) are rejected; only text survives the gate. - Each accepted file becomes an independent source with `file://` URL — same pipeline as online `add`, goes through smell test, appears in `sources` and `coverage`. - `--original-url` / `--origin-tool` / `--origin-note` record fallback provenance in `session.jsonl` when a local file is standing in for a failed online hand. ### GitHub trust audit ``` ascent-research github-audit / [--depth repo|stargazers|timeline] [--sample N] [--out audit.json] [--html audit.html] ascent-research github-audit https://github.com// --depth timeline --sample 500 --out audit.json --html audit.html ``` - Produces a deterministic trust evidence envelope: repo ratios, sampled stargazer profile signals, timeline burst signals, trust score, risk score/band/confidence/reasons/evidence. - Default depth is `stargazers`, default sample is 200. Use `repo` when GitHub token auth is unavailable. - `stargazers` and `timeline` use `postagent` with `Authorization: Bearer $POSTAGENT.GITHUB.TOKEN`; no raw token should appear in stdout/stderr/session files. - `--out` writes the full JSON envelope for `add-local` ingestion into a follow-up `--preset github-trust` report. - `--html` writes the deterministic scorecard HTML. This is not the generic research report template and does not call an LLM. ### Autonomous loop (feature: `autoresearch`) ``` ascent-research loop [] --provider {fake|claude|codex} [--iterations N] [--max-actions M] [--dry-run] [--fake-responses 'r1;r2;...'] ``` - Default iteration budget is 8, default action budget is 20 — both raisable. - `fake` provider replays scripted JSON turns; used by tests and manual debug runs. - `claude` provider uses `cc-sdk` (requires `--features provider-claude` at build time). - `codex` provider spawns `codex app-server` (requires `--features provider-codex`). - `opencode-go` provider (new in v0.4.2) hits the [OpenCode Go](https://opencode.ai/zen/go) $10/mo subscription via OpenAI- or Anthropic-compatible HTTP (requires `--features provider-opencode-go`). Required env: `OPENCODE_API_KEY`, `ASR_OPENCODE_MODEL` (pick from the OpenCode Go catalog; no default). Optional: `ASR_OPENCODE_PROTOCOL` (`openai` default | `anthropic`), `ASR_OPENCODE_TEMPERATURE` (default 0.2), `ASR_OPENCODE_MAX_TOKENS` (default 16384), `ASR_OPENCODE_TIMEOUT_MS` (default 120000, clamped [5000, 600000]). Useful when Claude/ChatGPT subscriptions are out of reach (cost / region / payment). - Loop reads `SCHEMA.md` each turn; user edits via `schema edit` take effect on the next iteration. - Action types the loop accepts: `write_plan`, `write_overview`, `write_aside`, `write_section`, `write_diagram`, `note_diagram_needed`, `digest_source`, `fact_check`, `add`, `batch`, `write_wiki_page`, `append_wiki_page`, `actionbook_search`, `actionbook_manual`, `actionbook_run_code`. - New in v0.4.0 — the three `actionbook_*` actions are dispatch arms for the V2 MCP backend. Per-iteration caps: `actionbook_search` ≤ 5, `actionbook_manual` ≤ 5, `actionbook_run_code` ≤ 3. Long outputs get a `[…truncated to KB…]` marker. Each emits an `ActionbookCalled` jsonl event. - Termination reasons: `report_ready`, `iterations_exhausted`, `max_actions_exhausted`, `provider_done`, `provider_unavailable`, `diverged` (same coverage signature 3 turns in a row). ### User-editable loop guidance (v3) ``` ascent-research schema show [--slug ] ascent-research schema edit [--slug ] # opens $EDITOR ``` - Starter template has five sections: Goal / Wiki conventions / What to emphasize / What to deprioritize / House style. - Edits that actually change the body emit a `SchemaUpdated` jsonl event; no-op edits (e.g. `:q`) don't. - Loop strips HTML comments before injecting, so placeholder hints in the starter don't leak into the prompt. ### Wiki layer (v3) ``` ascent-research wiki list [--slug ] ascent-research wiki show [--slug ] ascent-research wiki rm [--slug ] [--force] ascent-research wiki query "" [--slug ] [--save-as ] [--format prose|comparison|table] [--provider fake|claude|codex] ascent-research wiki lint [--slug ] [--stale-days N] ``` - Page slug rules: `[a-z0-9_-]{1,64}`. - Frontmatter fields: `kind` (entity / concept / source-summary / comparison / analysis), `sources` (URL list), `related` (slug list), `updated` (date). - Cross-links use `[[slug]]`; the renderer resolves existing pages to ``, flags broken targets as ``. - `wiki query` retrieval: token-overlap against page bodies + slug names, plus one-hop BFS along outbound `[[slug]]` links from the top seeds. Top-N default 5, capped at 2×N after BFS. - `wiki query --save-as ` persists the answer as a `kind: analysis` page with `sources: [wiki:a, wiki:b, ...]` frontmatter citing the retrieved pages. - `wiki lint` checks: orphans (no inbound link), broken outbound `[[...]]`, stale `updated:` dates, missing cross-refs (two pages share a source but don't `[[ref]]` each other), kind conflicts (slug variants with mismatched `kind:`). Non-blocking — diagnostic only. ### Output / QA ``` ascent-research synthesize [] [--no-render] [--open] [--bilingual] [--pdf] [--pdf-output ] ascent-research report --format rich-html|brief-md [--open | --no-open] [--stdout] [--output ] ascent-research series [--open] ascent-research coverage [] ascent-research diff [] [--unused-only] ``` - `synthesize` is the full path: renders `report.json` + inline-SVG + wiki TOC + sources list + optional bilingual (`--bilingual` calls Claude to inject `

` siblings). - PDF export is opt-in only: `synthesize --pdf` converts the rendered `report.html` to `/report.pdf` through the local backend. It prefers isolated Playwright Chromium/headless_shell and keeps the HTML on the user's machine. If no safe local Chromium is installed, run `npx playwright install chromium` or set `ASR_PDF_CHROME_BIN`. Do not auto-launch the user's desktop Google Chrome; only opt into that fallback with `ASR_PDF_ALLOW_SYSTEM_CHROME=1` if explicitly requested. `--pdf-output ` overrides the output and implies `--pdf`. - `report --format brief-md` dumps a lean markdown digest — useful for PR descriptions or quick sharing. - `series ` renders an HTML index for every session carrying that tag. - `coverage` returns metrics + `report_ready_blockers` (array of human-readable reasons). If `report_ready: true`, the session is done. For `--tag fact-check` sessions, inspect `fact_check_required`, `fact_checks_total`, and `fact_check_invalid_sources`. - `diff` surfaces two sets: `unused` (accepted but never cited) and `hallucinated` (cited URLs that weren't accepted). `--unused-only` trims to the first set. ### Global flags (apply to every command) ``` --json machine-readable envelope (ok/data/error/meta) -v / --verbose stderr verbosity --no-color disable ANSI --help clap-generated help; also `ascent-research help` ``` Envelope shape: ```json { "ok": true, "command": "ascent-research add", "context": {"session": "tokio-v3", "url": "..."}, "data": {"...": "..."}, "error": null, "meta": {"duration_ms": 1820, "warnings": []} } ``` On failure, `error.code` is machine-readable — never parse `error.message` for routing decisions. ## Scenario Playbooks ### A. Survey a technology topic from public sources ```bash RBIN=~/.cargo/bin/ascent-research # or target/release/ascent-research $RBIN new "state-space models vs attention 2026" --slug ssm-vs-attn --preset tech $RBIN batch \ https://arxiv.org/abs/2111.00396 \ https://arxiv.org/abs/2312.00752 \ https://huggingface.co/papers/2111.00396 \ --concurrency 4 $RBIN loop ssm-vs-attn --provider claude --iterations 10 $RBIN wiki query "what breaks when you scale S6 past 10B params?" \ --format comparison --save-as s6-scaling $RBIN synthesize ssm-vs-attn --bilingual --open ``` ### B. Deep-dive a Rust library's source tree ```bash $RBIN new "tokio internals 2026" --slug tokio-v3 --preset tech $RBIN schema edit # set "what to emphasize" $RBIN add-local ~/tokio/tokio/src/runtime/scheduler \ --glob '**/*.rs' --glob '!**/tests/**' \ --max-file-bytes 65536 --max-total-bytes 524288 $RBIN add-local ~/tokio/tokio/src/runtime/task \ --glob '**/*.rs' --glob '!**/tests/**' $RBIN loop tokio-v3 --provider claude --iterations 12 --max-actions 40 $RBIN wiki query "how does the scheduler balance work across threads?" \ --save-as scheduler-balancing $RBIN wiki lint --slug tokio-v3 $RBIN synthesize tokio-v3 --open ``` ### C. Paper + companion codebase ```bash $RBIN new "S4 state space model" --slug s4 --preset tech $RBIN add https://arxiv.org/abs/2111.00396 $RBIN add https://github.com/HazyResearch/state-spaces $RBIN add-local ~/state-spaces/src --glob '**/*.py' --max-file-bytes 65536 $RBIN loop s4 --provider claude --iterations 8 $RBIN synthesize s4 --bilingual --open ``` ### D. Compare two frameworks with a dedicated analysis page ```bash $RBIN new "tokio vs async-std scheduling 2026" --slug cmp-tokio-async-std $RBIN batch https://github.com/tokio-rs/tokio \ https://github.com/async-rs/async-std $RBIN loop cmp-tokio-async-std --provider claude --iterations 10 $RBIN wiki query "scheduling strategy: work-stealing vs single-queue" \ --format comparison --save-as cmp-scheduling $RBIN synthesize cmp-tokio-async-std --open ``` ### E. Fork a session, refocus ```bash $RBIN new "tokio task system isolation" --slug tokio-tasks --from tokio-v3 \ --tag rust-deep-dive --tag task-system $RBIN schema edit --slug tokio-tasks # narrow the goal $RBIN loop tokio-tasks --provider claude --iterations 8 ``` ### F. Resume a stale session ```bash $RBIN list --tag rust-deep-dive $RBIN resume tokio-v3 $RBIN status $RBIN schema edit # refocus if goal has shifted $RBIN loop tokio-v3 --provider claude --iterations 6 ``` ### G. Series index for many sibling sessions ```bash for topic in axum actix hyper rocket; do $RBIN new "$topic internals 2026" --slug "${topic}-deep" --tag rust-web $RBIN add "https://github.com/tokio-rs/$topic" $RBIN loop "${topic}-deep" --provider claude --iterations 6 done $RBIN series rust-web --open # cross-linked HTML index across all 4 ``` ### H. Manual curation (no LLM) ```bash $RBIN new "skim axum routing" --slug axum-skim --preset tech $RBIN add-local ~/axum/axum/src/routing --glob '**/*.rs' $RBIN sources axum-skim # see what was accepted $RBIN synthesize axum-skim --open # ingest-list + minimal HTML, no loop ``` ### I. Debug a preset rule ```bash $RBIN route "https://some.obscure.host/foo" --prefer browser # Prints the chosen executor + command template. Then: $RBIN add "https://some.obscure.host/foo" # see if preset matched $RBIN sources --rejected # if smell rejected, why ``` ## Loop Contracts (what the autoresearch prompts enforce) These rules are encoded in `autoresearch/executor.rs` and surfaced to the agent as non-negotiable: - **First-iteration contract.** A fresh session accepts only `write_plan`. Other actions are rejected with `plan_required`. - **Every accepted source must be digested.** `sources_unused > 0` is a `report_ready` blocker. The agent cannot skip a URL the user added. - **Dynamic facts need explicit fact checks.** Sessions tagged `fact-check` require at least one `FactChecked` event. Use `fact_check` for live/sports/news/current roster/current price/latest-version claims and keep working while `fact_checks_total` or `fact_check_invalid_sources` appears in blockers. - **Wiki-first for durable entities.** Source summaries, recurring concepts, library components → `write_wiki_page`. Numbered sections cite `[[slug]]` pages. - **Figure-rich contract.** Target ≥ 1 SVG per numbered section. Every `![](diagrams/x.svg)` requires a matching `write_diagram` same-or-earlier turn; every `write_diagram` should have a body reference. The user prompt nags about unresolved references and orphan SVG files at the top of each turn. - **`write_section` preserves figures.** If the current section body references `![](diagrams/x.svg)` and your new body omits it, the CLI re-appends the reference automatically — agents never silently orphan figures even if they try. - **No plan re-authoring.** The `## Plan` block is pinned at the top of the prompt from iteration 2 onward. Emitting `write_plan` after iter 1 is wasted tokens unless you're materially revising. - **SVG safety.** `write_diagram` bodies must start with ``, ``, `on*=` handlers, or `javascript:` URLs. Max 3 `write_diagram` per turn. Violations become `DiagramRejected` events with a reason code. ## Output Shape ``` /report.html (sticky top-right) Research report

Bash tooling is faster.
- One-file reads → if `cat foo.rs | head -50` answers the question, don't spin up a session.

## Quality Heuristics

1. Every wiki page cites at least one URL in its frontmatter.
2. Numbered sections cite `[[wiki-slug]]` pages rather than restating the wiki content.
3. Hand-drawn SVG figures, not screenshots or PNGs — readable in any browser, zero external assets.
4. `wiki lint` reports 0 orphans and ≤ 3 broken links before calling the session done.
5. `coverage` returns `report_ready: true` with no blockers.
6. `diff --unused-only` is empty — no accepted source went uncited.}