---
name: forge-citation
description: Citation discipline for AI-generated outputs. When to cite, what counts as a verifiable citation, URL + quote + file-path verification, marking inference vs fact, what NOT to cite. Contains paste-ready citation formats for markdown, RAG-with-chunks, and code references. Use whenever an LLM produces a fact-claim that someone might act on - RAG, research, code-with-references, legal/medical/financial outputs.
license: MIT
---

# forge-citation

You are writing or reviewing AI output that makes claims. Default agent outputs either cite nothing (every claim looks equally authoritative) or fabricate citations that look plausible but point to nothing. Both fail the same way: the reader has no path to verify, and trust collapses on the first discovered error. This skill exists to make citations real.

The mental model: **a citation is a contract with the reader.** "I drew this claim from here; you can check if you want." Breaking the contract by citing things that do not exist is worse than not citing at all - it actively misleads.

## Quick reference (the things you must never ship)

1. "Studies show..." or "experts say..." with no named source.
2. A URL in a citation that returns 404.
3. A quoted phrase that does not appear in the cited source.
4. A file path or function name in a citation that does not exist in the codebase.
5. A statistic with no method or sample size cited.
6. Citing the LLM itself ("Claude told me", "GPT-4 says").
7. Citing AI-generated content as a primary source.
8. Mixed prose where confident claims and inferences are presented identically.
9. A long quote with no source link.
10. A reference list at the bottom that no inline claim points to.

## Hard rules

### What to cite

**1. Specific facts, dates, statistics, quotes.** "47 active orders" - if pulled from a DB, cite. "p99 latency is 320ms" - cite the source.

**2. Phrases that imply a source.** "According to," "studies show," "the docs say," "X argues that" - all claim a source exists.

**3. Specific code snippets adapted from a documented source.** Especially when proposing a best practice.

**4. Do NOT need to cite general knowledge.** "JSON is a data interchange format" needs no source.

**5. Do NOT need to cite your own reasoning.** "I think X because Y" is opinion; surface as opinion.

### Citation format

**6. Inline, immediately after the claim.** Footnote `[1]`, link `[text](url)`, or `[source-label]` (for RAG chunks). Pick one and stay consistent within an output.

**7. Include enough to find the source.** Author/site, title, date, URL where applicable.

**8. Date the access for live sources.** `[Postgres 17 docs, accessed 2026-05-22]`. Web changes; access timestamp helps future readers.

**9. Internal sources cite the artifact precisely.** `src/billing/charger.ts:142`, `users WHERE id = 'abc'`, `Linear ticket OPS-1234`.

### Verifying citations

**10. Before citing a URL, confirm it resolves.** A 404 in your citation is the cheapest credibility loss.

```ts
async function verifyUrl(url: string): Promise<boolean> {
  try {
    const res = await fetch(url, {
      method: "HEAD",
      redirect: "follow",
      signal: AbortSignal.timeout(5000),
    });
    return res.ok;
  } catch { return false; }
}
```

**11. Before quoting from a source, confirm the quote is in the source.** Word-for-word. AI agents sometimes fabricate quotes that "look right."

**12. Before citing a file path or symbol, confirm it exists.** `grep` is cheap.

```ts
import { existsSync } from "node:fs";
function verifyFileRef(path: string, lineNumber?: number): boolean {
  if (!existsSync(path)) return false;
  if (lineNumber === undefined) return true;
  const lines = readFileSync(path, "utf-8").split("\n");
  return lines.length >= lineNumber;
}
```

**13. Before citing a date or statistic, confirm against the source.** "Released in 2023" - check whether it was 2023 or 2024.

### Distinguishing certainty

**14. Confidence with the claim, not separately.**

```
Established (multiple sources, primary):
  Plain claim with citation: "Postgres 17 added json_table [link]."

Likely (single source, secondary):
  "[likely] Performance drops noticeably above 100K offset [Cybertec benchmark, 2024]."

Inferred (no direct source, reasoning from facts):
  "[inferred] Therefore the migration plan must split index creation outside the transaction."

Disputed:
  "[disputed; Source A says X, Source B says Y]"
```

**15. Never present inference as fact.** "Probably," "I think," "may," "could" - use them. Signal, not weakness.

**16. Surface when a claim could not be sourced.** "Could not find a primary source; treating as common-knowledge / community lore."

### When citations would mislead

**17. Do not cite when the source contradicts you.** Quoting a source that supports the OPPOSITE claim is a serious error, common in poorly-checked AI output.

**18. Do not cite when you only saw the snippet, not the page.** Either fetch and verify, or hedge ("according to the snippet [link]").

**19. Do not cite AI-generated content as a primary source.** Summarized content of summarized content is unreliable.

**20. Do not cite the LLM itself.** Either restate the actual source, or mark as inference.

### Quoting

**21. Quotes are exact.** Capitalization, punctuation, the works.

**22. `...` for omission inside a quote. `[brackets]` for editorial insertion.** Standard scholarly conventions.

**23. Long quotes (more than 2-3 sentences) get block formatting and tighter sourcing.**

### Numbers and statistics

**24. Cite statistics with source and date.** "47% of users" - says who, measured when, what method?

**25. Be precise about what was measured.** "p99 latency 320ms across 1M requests over 7 days" beats "latency is fast."

**26. Round-numbered statistics from an AI without a source are suspect.** "20% improvement" or "10x faster" without a benchmark citation is often invented.

### Code citations

**27. When proposing code from documentation, cite the doc page.**

**28. When citing patterns from a known project, cite file + commit.**

```
// from https://github.com/torvalds/linux/blob/v6.6/kernel/sched/core.c#L1234
```

**29. When pulling from a third-party library, cite the version.** `axios v1.7 docs: ...`. Versions matter.

### RAG-specific

**30. Every claim in a RAG output cites the retrieved chunks. Inline.**

```
Refunds for digital goods are available within 14 days [chunk-04].
Physical goods follow a different policy [chunk-07].
```

**31. Retrieved chunks exposed to the user.** Not just an answer; answer + source passages. See [`forge-rag`](../../llm/forge-rag/SKILL.md) rule 19.

**32. Citations pointing to chunks NOT in the retrieval set are hallucinations.** Verify before sending.

### Legal / medical / financial outputs

**33. High-stakes outputs require source attribution to a degree low-stakes do not.** Every factual claim has a source. Inferences are explicitly inferred.

**34. Surface limits of the source.** "This is general guidance; consult a [lawyer/doctor/accountant] for your specific case."

## Citation formats (paste-ready)

### Markdown inline

```
The Postgres 17 release added the `json_table` function for SQL-standard
JSON access ([release notes](https://www.postgresql.org/docs/17/release-17.html)).
```

### Numbered footnotes

```
The user count is 47 [1].

[1] Internal database query against `users` table, executed 2026-05-22.
```

### RAG with chunk references

```
Refunds for digital goods are available within 14 days [chunk: policies/refunds.md, paragraph 3]. Physical goods follow a different policy [chunk: policies/returns.md, paragraph 1].
```

### Code references

```ts
// Adapted from the official Postgres 17 docs:
// https://www.postgresql.org/docs/17/datatype-json.html
SELECT * FROM json_table(
    data,
    '$[*]' COLUMNS (
        id INT PATH '$.id',
        name TEXT PATH '$.name'
    )
);
```

### Confidence-tagged claims

```markdown
**Established:** Postgres 17 added json_table [docs link, accessed 2026-05-22].

**Likely:** Performance drops noticeably above 100K offset on tables over 1M rows [Cybertec benchmark, 2024-09]. Not verified on our hardware.

**Inferred:** Therefore the orders endpoint at our scale (~5M rows) should not use offset pagination, even though the team is more familiar with it.

**Open:** Whether `cursor_tuple_fraction` affects this trade-off was not addressed by any source I found.
```

## Common AI-output patterns to reject

| Pattern | Why wrong | Fix |
| --- | --- | --- |
| "Studies show..." no link | Vague | Cite the study or remove the claim |
| `[link](https://...)` to 404 | Hallucinated URL | Verify, replace, or remove |
| Quote with no source | Misattribution | Quote + cite + verify |
| "According to recent research..." vague | Appeal to authority | Name the research |
| Round stats no method ("20% faster") | Often invented | Bench cited + method stated |
| `// src/foo.ts:42` that doesn't exist | Hallucinated file ref | grep / read before citing |
| AI summary cited as primary | Compounded error | Cite the original |
| "Claude told me..." cited as fact | LLM is not a source | Restate or mark as inference |
| Reference list at bottom, no inline pointers | Decorative | Inline citations or drop |
| Long uninterrupted prose, no citations | Where did this come from? | Citation per claim |

## Worked example: a researched paragraph

Bad version:

> Postgres has supported cursor-based pagination for a long time, and it's much faster than offset pagination. Some studies show 10x improvements at scale. Most teams should use it.

Problems: "for a long time" - vague; "much faster" - vague; "some studies" - unnamed; "10x" - round number, no method; "most teams" - appeal to consensus; zero links.

Good version:

> Postgres has supported keyset (cursor-based) pagination since the relevant indexed queries existed in early versions; the technique is documented on the Postgres wiki [accessed 2026-05-22].
>
> **Established:** Offset pagination requires the database to read and discard `N` rows before returning a page, making it O(N) per page [Postgres 17 docs §7.6, accessed 2026-05-22]. Cursor-based pagination on an indexed column is O(log N) [Postgres wiki "Pagination", accessed 2026-05-22].
>
> **Likely:** Cybertec's 2024 benchmark on a 5M-row table showed `OFFSET 10000 LIMIT 50` at ~250ms vs cursor at ~6ms on identical hardware (link). Numbers will vary; the order of magnitude is consistent across published benchmarks.
>
> **Inferred:** For our orders endpoint at 5M+ rows, cursor-based is the correct default.

## Workflow

1. **Identify the factual claims.** What needs sourcing?
2. **For each, identify or recall the source.**
3. **Verify the source exists and supports the claim.** Open the link or grep the file.
4. **Quote exactly, paraphrase honestly.**
5. **Mark confidence where less than certain.**
6. **Place citations inline at the claim.**

## Verification

Manual checklist:

- [ ] Every factual claim has either a citation or a confidence marker.
- [ ] All URLs cited actually resolve.
- [ ] All quoted text appears in the source.
- [ ] All cited file paths and symbols exist in the codebase.
- [ ] Inferences labeled as inferences.
- [ ] No AI-generated content cited as primary.
- [ ] Dates of access included for live web sources.

## When to skip this skill

- Creative writing where citations are inappropriate.
- One-shot conversational outputs.
- Internal-only outputs where the reader has the same context.

## Related skills

- [`forge-research`](../forge-research/SKILL.md) - the methodology that produces the claims.
- [`forge-web-search`](../forge-web-search/SKILL.md) - the search that produces the URLs.
- [`forge-rag`](../../llm/forge-rag/SKILL.md) - citation discipline for RAG outputs.
- [`forge-subagent-eval`](../../agents/forge-subagent-eval/SKILL.md) - verifying citations in subagent outputs.
