---
name: tracking-content-sources
description: Maintain a durable registry of high-value content sources for the personal LLM wiki, so good sources get revisited instead of forgotten. Two operations — ADD logs a newly discovered source (deduped) to docs/sources.md, and REVISIT checks logged sources for new material since their last_checked date and bumps the date. Distinct from finding-new-wiki-content, which discovers individual items; this skill tracks the sources themselves. Use when the user finds a "gold mine" worth returning to, wants to log a feed, or asks "what's new from my sources".
version: 1.0
created: 2026-05-28
updated: 2026-05-28
---

# Tracking Content Sources

A persistent registry of sources worth coming back to. The recurring problem this solves: the user finds a gold mine (a blog, a researcher, a newsletter), clips a few things, then forgets it exists and never checks it again. This skill makes sources first-class, tracked assets.

It tracks **sources** (a site, feed, author, or publication). It does not discover or rank individual articles — that's `finding-new-wiki-content`. The two compose: this skill's registry is the preferred seed list that discovery reads first.

The registry lives at `docs/sources.md`.

## When to use

- "Add this to my sources / remember this site — it's a gold mine"
- "Log this feed so we check it regularly"
- "What's new from my sources?" / "check my sources" → REVISIT
- "Show me my sources" → read and present `docs/sources.md`

## When NOT to use

- Discovering or ranking individual articles/papers → `finding-new-wiki-content`
- Ingesting files in `raw/` → `claude-wiki-ingest`

## Registry format

`docs/sources.md` is a single markdown table. Columns:

| Source | URL / Feed | Topic Focus | Tier | Quality Notes | Added | Last Checked |
|---|---|---|---|---|---|---|

- **Source** — human name (e.g. "Han Lee — leehanchung.github.io", "Anthropic Engineering Blog").
- **URL / Feed** — canonical landing page or RSS/Atom feed if one exists (prefer the feed for reliable revisits).
- **Topic Focus** — short comma-separated tags describing what it covers (e.g. "RL environments, agent training"). Discovery matches its request against this column.
- **Tier** — `primary` (papers, lab blogs, official docs), `practitioner` (eng blogs, individual researchers, newsletters), or `discussion` (forums, social). Matches the tiers in `finding-new-wiki-content`.
- **Quality Notes** — one line on why it's worth tracking and what to expect.
- **Added** — `YYYY-MM-DD` first logged.
- **Last Checked** — `YYYY-MM-DD` REVISIT last swept it. Equal to Added on first entry.

## ADD operation

Log a new source.

1. Read `docs/sources.md`. If it doesn't exist, create it from the template in "Bootstrapping" below.
2. **Dedupe** — if the URL/domain (or clearly the same source under a different URL) is already a row, don't add a duplicate; update its Quality Notes or Topic Focus instead and report that it already existed.
3. **Liveness check (required).** Fetch the source and find the date of its most recent post. If the latest publication is **older than ~12 months**, the source is dormant — do **not** add it silently. Report the actual last-published date and ask the user whether to add it anyway (some valuable sources publish rarely but are worth watching). The point is to avoid loading the registry with sources REVISIT will sweep forever and never find anything new in. Verify the *content feed* specifically, not a paper's original date or an unrelated page — e.g. check the blog index, not a `/research` page that may lag.
4. Resolve a feed URL if one is easily found (append `/feed`, `/rss`, `/atom.xml`, or check the page) — feeds make REVISIT far more reliable than scraping a landing page. If none, store the landing page.
5. Classify the tier and write a one-line Quality Note.
6. Append the row with `Added` and `Last Checked` both set to today.
7. Confirm what was added in one line, including the latest-post date you observed.

## REVISIT operation

Check tracked sources for new material since each was last swept.

1. Read `docs/sources.md`. If the user named a topic or specific source, filter to matching rows; otherwise sweep all (or offer to start with the stalest by `Last Checked`).
2. For each source, fetch its current content via the sanctioned tools — `firecrawl_scrape` on the feed/landing page, or a domain-scoped search for recent items (Exa `web_search_exa`, or `firecrawl_search` `site:`-style). Never use curl/wget/requests, and avoid `toolbase-proxy` tools (they fail silently without a separate Toolbase login).
3. Identify items **newer than `Last Checked`** (use publish dates where available; otherwise judge by what's not already in the wiki). Dedupe against `wiki/index.md` and `docs/content-discovery-log.md` so already-seen items don't resurface.
4. Present new items grouped by source, as a numbered list compatible with the discovery skill's pull step:

```
**New from your sources** (since last check)

<Source name> — last checked <date>
  1. **<Title>** — <URL> · <one-line why it's notable>
  2. ...
```

5. After sweeping each source, update its `Last Checked` to today in `docs/sources.md`.
6. **Dormancy retirement.** While sweeping, note each source's latest-post date. If a source's most recent material is **older than ~12 months**, flag it for retirement in the report ("dormant since <date> — retire?") and offer to remove the row. Don't keep sweeping sources that have gone quiet for a year; that's wasted effort and the exact problem the registry is meant to avoid.
7. Offer the handoff: "Want me to pull any of these into `raw/`? (say 'pull N, N')" — the actual fetch-and-save uses `finding-new-wiki-content`'s Step 7 (articles → `raw/Articles/`, papers → `raw/research_papers/`), and logs accepted/rejected to `docs/content-discovery-log.md`.

## Bootstrapping

If `docs/sources.md` is missing, create it with this header, then proceed:

```markdown
# Content Source Registry

High-value sources for the personal LLM wiki. Maintained by the `tracking-content-sources` skill.
Read as the preferred seed list by `finding-new-wiki-content` before it searches the open web.

| Source | URL / Feed | Topic Focus | Tier | Quality Notes | Added | Last Checked |
|---|---|---|---|---|---|---|
```

## Pitfalls

- **Prefer feeds over landing pages.** A landing page changes layout and is hard to diff; an RSS/Atom feed gives clean, dated items and makes REVISIT reliable.
- **Always bump Last Checked.** A REVISIT that doesn't update the date will re-report the same items next time.
- **Dedupe on ADD.** The registry's value is that it's curated, not bloated — one row per source.
- **Don't let REVISIT silently swallow failures.** If a source's feed is dead or unreachable, say so in the report (it may be a sign the source is defunct and the row should be retired).

## Related

- `finding-new-wiki-content` — reads this registry as preferred seeds; shares the pull step and the `docs/content-discovery-log.md` decisions log.
- `claude-wiki-ingest` — turns pulled `raw/` files into wiki pages.
