---
name: research-gather
description: Gathers and lists research resources (academic papers, patents, websites, business cases) for specified research domains. Works as the "resource collection" phase after domain mapping — takes clustering results, user keywords, or domain descriptions as input and produces structured resource lists per domain. Use this skill when the user wants to "collect papers for each area", "find patents in this domain", "gather resources for these topics", "list relevant papers and patents", "arXivで論文を集めて", "各領域のリソースを収集", "特許と論文のリストを作って", "この分野の文献を集めて", or any request to systematically find and list research materials across multiple domains. Also triggers when the user has clustering output and wants to proceed to resource collection, or when they provide keywords and want a literature/patent list.
---

# Research Gather — Resource Collection by Domain

Collects academic papers, patents, websites, and business cases for specified research domains and produces structured resource lists. This skill sits between domain mapping (research-clustering) and detailed reports (research-retrieval) in the research pipeline.

## Auto Mode (`--auto`)

When `$ARGUMENTS` contains `--auto`, run the entire workflow **non-interactively** — skip ALL AskUserQuestion calls and use the following defaults:

| Parameter | Default Value |
|-----------|--------------|
| Resource Types | 学術論文 + 特許 |
| Time Range | 直近4年 |
| Collection Depth | 標準（各5〜10件） |
| Domain Selection | すべてのクラスタ |
| Next Action (Step 6) | 完了（自動終了） |

In `--auto` mode, the remaining text in `$ARGUMENTS` (after removing `--auto`) is used as the input (file path or keywords). For example: `/research-gather --auto docs/research/clustering-result.md` → input is the clustering result file.

If `$ARGUMENTS` does NOT contain `--auto`, proceed with the normal interactive workflow below.

## Pipeline Position

```
research-clustering → research-gather → research-retrieval
(domain mapping)      (resource lists)   (paper deep-dive)
```

## Workflow

### Step 1: Parse Input

Determine the input type and extract domain information.

**Supported input types:**

1. **Clustering output file** — Markdown file generated by research-clustering. Parse the cluster structure (names, keywords, overview) directly.
2. **User keywords/text** — Keywords, phrases, or natural-language descriptions provided in conversation. Extract domains and search terms from these.
3. **Existing Markdown file** — A user-prepared file listing research domains or topics.

For clustering output, detect it by looking for the characteristic structure: "Cluster Summary" table, "Cluster Details" sections with keywords and research strategy. Use the cluster names, keywords, and strategies as the basis for resource collection.

For user keywords/text, group related terms into tentative domains before proceeding. If the grouping is ambiguous, confirm with the user.

### Step 2: User Hearing

> **`--auto` mode**: Skip this entire step. Use the default values from the Auto Mode table above.

Confirm research parameters via AskUserQuestion. Skip hearings for parameters already specified by the user in their request.

#### Hearing 1: Resource Types

```
AskUserQuestion:
  question: "どの種類のリソースを収集しますか？（複数選択可）"
  header: "リソース種別"
  multiSelect: true
  options:
    - label: "学術論文"
      description: "arXiv、IEEE、ACM等の学術論文を検索"
    - label: "特許"
      description: "Google Patents、USPTO、J-PlatPat、Espacenet等から検索"
    - label: "技術情報"
      description: "技術ブログ、カンファレンス発表、OSSプロジェクト等"
    - label: "ビジネス事例"
      description: "企業導入事例、市場レポート、業界動向"
```

#### Hearing 2: Time Range

```
AskUserQuestion:
  question: "対象期間を指定してください"
  header: "対象期間"
  multiSelect: false
  options:
    - label: "直近4年（推奨）"
      description: "2022年〜現在の結果を対象"
    - label: "直近2年"
      description: "最新トレンドに絞る"
    - label: "直近7年"
      description: "より広い範囲をカバー"
    - label: "カスタム"
      description: "任意の期間を指定"
```

If "カスタム" is selected, ask a follow-up for the specific year range.

#### Hearing 3: Collection Depth

```
AskUserQuestion:
  question: "各領域あたりの収集件数はどの程度にしますか？"
  header: "収集件数"
  multiSelect: false
  options:
    - label: "標準（各5〜10件）（推奨）"
      description: "主要なリソースを網羅。バランスの良い量"
    - label: "広範（各10〜20件）"
      description: "できるだけ多くのリソースを収集。時間がかかる場合あり"
    - label: "簡潔（各3〜5件）"
      description: "代表的なリソースのみ。素早く概観を得たい場合"
```

#### Hearing 4: Domain Selection (clustering input only)

If the input is from clustering and contains multiple clusters, ask which domains to investigate:

```
AskUserQuestion:
  question: "どのクラスタのリソースを収集しますか？"
  header: "対象クラスタ"
  multiSelect: true
  options:
    (dynamically generated from cluster names — show up to 4; if more than 4 clusters, group or offer "すべて" as the first option)
```

### Step 3: Resource Collection

For each target domain, search for resources in parallel using the Agent tool to spawn subagents.

#### 3a: Academic Papers (arXiv-first)

Papers are searched with arXiv as the primary source because it provides open-access full text, stable URLs, and consistent metadata.

**Search strategy:**

1. **arXiv search via WebSearch**: Query `site:arxiv.org "{domain keyword}" {year range}` to find relevant papers. Also search for survey/review papers: `site:arxiv.org "{domain keyword}" survey OR review`.
2. **Semantic Scholar / Google Scholar fallback**: If arXiv results are insufficient (e.g., the domain is not well-represented on arXiv), broaden to `"{domain keyword}" paper {year}` on general web search.
3. **IEEE/ACM for specific domains**: For domains where conference proceedings are important (networking, systems, HCI), also search `site:ieee.org` or `site:dl.acm.org`.

**For each paper, collect:**
- Title
- Authors (first author + "et al." for >3 authors)
- Year
- Venue (arXiv, conference name, journal)
- arXiv ID or DOI
- URL (prefer `arxiv.org/abs/` format)
- 1-2 sentence summary

**CRITICAL — Anti-hallucination rule for URLs:**
- Only record URLs that appear verbatim in WebSearch results. NEVER construct or guess arXiv IDs.
- If a search result shows a title but no direct URL, run a follow-up `WebSearch` for `site:arxiv.org "{exact paper title}"` to obtain the real URL.
- Do NOT fabricate arXiv IDs by combining partial numbers. Every URL must come from a search result or a WebFetch response.

**Quality signals to prioritize:**
- High citation count (if visible in search results)
- Survey/review papers (valuable for overview)
- Papers from top venues (NeurIPS, ICML, CVPR, ACL, etc.)
- Recent papers with significant attention

#### 3b: Patents

Search across multiple patent databases to get broad coverage.

**Search strategy:**

1. **Google Patents** (primary): `site:patents.google.com "{domain keyword}"` — provides international coverage with English abstracts
2. **USPTO**: Search for US patents when the domain has strong US presence
3. **J-PlatPat**: Search in Japanese for Japan-specific patents — useful when keywords have Japanese equivalents
4. **Espacenet**: Search for European patents when relevant

**For each patent, collect:**
- Title
- Patent number (e.g., US11234567B2, JP2023-123456)
- Assignee/Applicant
- Filing year
- Patent office (USPTO/JPO/EPO/WIPO)
- URL
- 1 sentence summary of the invention

**CRITICAL — Anti-hallucination rule for URLs:**
- Only record patent numbers and URLs that appear verbatim in search results or WebFetch responses.
- NEVER fabricate patent numbers. If a search result mentions a patent without a clear number, run a follow-up search to obtain the exact number and URL.

**Prioritize:**
- Patents from major companies in the domain
- Recent patents (within the specified time range)
- Patents with many citations or family members

#### 3c: Technical Resources

Search for high-quality technical content.

**Search targets:**
- Technical blogs from major companies (Google AI Blog, Meta Research, Microsoft Research, etc.)
- Conference talks and presentations (from slides/video sharing sites)
- Notable OSS projects on GitHub
- Technical standards and specifications

**For each resource, collect:**
- Title
- Source/Author
- Year/Date
- Type (blog/talk/OSS/standard)
- URL
- 1 sentence description

#### 3d: Business Cases

Search for enterprise adoption and market information.

**Search targets:**
- Case studies from consulting firms and vendors
- Industry reports and market analysis
- Press releases about deployments
- Industry conference presentations

**For each case, collect:**
- Title
- Company/Organization
- Year
- Type (case study/report/press release)
- URL
- 1 sentence summary

### Step 4: Organize, Deduplicate, and Verify

After collection:
1. Remove duplicate entries (same paper appearing from different searches)
2. Sort within each category by year (newest first), then by relevance
3. Verify URLs are properly formatted (especially arXiv links — ensure `arxiv.org/abs/` format)
4. **URL Verification (MANDATORY)** — verify every collected resource against its URL:

#### 4a: Academic Paper URL Verification

For each paper with an arXiv URL:
1. **WebFetch** the arXiv abstract page (`arxiv.org/abs/XXXX.XXXXX`)
2. Compare the fetched title with the collected title
3. Apply one of the following actions:
   - **Match**: Title matches (allowing minor formatting differences) → mark as `verified`
   - **Mismatch**: Title does not match → **discard the entry** and log a warning. Do NOT keep entries with mismatched title/URL pairs.
   - **Fetch failed**: URL is unreachable or returns an error → **discard the entry**

For papers with non-arXiv URLs (IEEE, ACM, etc.):
1. **WebFetch** the URL
2. Verify the page contains the expected paper title
3. Apply the same match/mismatch/failed logic above

#### 4b: Patent URL Verification

For each patent:
1. **WebFetch** the patent URL (Google Patents, USPTO, etc.)
2. Verify the patent number and title match
3. Discard entries where the patent number or title does not match

#### 4c: Technical Resource / Business Case URL Verification

For each resource:
1. **WebFetch** the URL
2. Verify the page is accessible and contains content related to the collected title
3. Discard entries where the URL is unreachable or content is unrelated

#### 4d: Verification Summary

After verification, log the results:
- Total collected: N entries
- Verified: M entries
- Discarded (mismatch): X entries
- Discarded (unreachable): Y entries

**Important**: It is better to have fewer verified entries than many unverified ones. Never include an entry in the output unless its URL has been verified. This prevents hallucinated or mismatched URL/title pairs from propagating to downstream tools (CSV lists, daily research pipeline).

If verification reduces the result set below the requested collection depth, run additional searches to find replacement resources, then verify those as well.

### Step 5: Output File Generation

Generate a **single Markdown file** containing all collected resources in table format. Do not split into multiple files — everything goes into one file for easy scanning and downstream processing.

**Output filename:** `resources-{topic-slug}.md`

#### Output template

```markdown
# {Research Theme} — リソース収集結果

## 収集パラメータ

- **対象リソース**: {学術論文 / 特許 / 技術情報 / ビジネス事例}
- **対象期間**: {YYYY – YYYY}
- **収集日**: {YYYY-MM-DD}
- **入力元**: {clustering結果 / ユーザーキーワード / ファイル}

## 収集サマリ

| 領域 | 論文 | 特許 | 技術情報 | 事例 | 合計 |
|------|------|------|----------|------|------|
| {domain 1} | {n} | {n} | {n} | {n} | {n} |
| {domain 2} | {n} | {n} | {n} | {n} | {n} |
| **合計** | **{n}** | **{n}** | **{n}** | **{n}** | **{n}** |

## URL検証結果

| 項目 | 件数 |
|------|------|
| 収集 | {total_collected} |
| 検証済み | {verified} |
| 不一致で除外 | {mismatched} |
| アクセス不可で除外 | {unreachable} |

{Below table includes only verified entries. All URLs have been confirmed via WebFetch.}

## 全体の傾向

{3–5 sentences: 収集結果から見える全体的な傾向、注目すべきポイント}

---

## 学術論文

{Include this section only if academic papers were requested. All domains are combined into one table, with a 領域 column to distinguish them.}

| # | 領域 | タイトル | 著者 | 年 | Venue | 概要 |
|---|------|---------|------|-----|-------|------|
| 1 | {domain} | [{title}]({url}) | {authors} | {year} | {venue} | {summary} |
| 2 | ... | ... | ... | ... | ... | ... |

---

## 特許

{Include this section only if patents were requested.}

| # | 領域 | タイトル | 番号 | 出願人 | 年 | 特許庁 | 概要 |
|---|------|---------|------|--------|-----|--------|------|
| 1 | {domain} | [{title}]({url}) | {patent_no} | {assignee} | {year} | {office} | {summary} |

---

## 技術情報

{Include this section only if technical resources were requested.}

| # | 領域 | タイトル | ソース | 年 | 種別 | 概要 |
|---|------|---------|--------|-----|------|------|
| 1 | {domain} | [{title}]({url}) | {source} | {year} | {type} | {description} |

---

## ビジネス事例

{Include this section only if business cases were requested.}

| # | 領域 | タイトル | 企業/組織 | 年 | 種別 | 概要 |
|---|------|---------|-----------|-----|------|------|
| 1 | {domain} | [{title}]({url}) | {company} | {year} | {type} | {summary} |

---

## 次のステップ

- **論文の詳細調査**: research-retrieval スキルでこのリストの論文を詳しく調査できます
- **追加の領域マッピング**: research-clustering スキルで関連領域をさらに探索できます
```

**Table rules:**
- Sort rows by 領域 (grouped together), then by year (newest first)
- The 領域 column allows filtering/searching by domain within one table
- Only include resource type sections that were requested (skip empty sections)
- Column set varies by resource type (shown in template above) — do not omit any columns

### Step 6: Output Confirmation and Next Actions

> **`--auto` mode**: Skip this step entirely. Treat the result as "完了" and finish.

After output is complete, confirm via AskUserQuestion:

```
AskUserQuestion:
  question: "リソース収集が完了しました。次のアクションを選択してください"
  header: "次のアクション"
  multiSelect: false
  options:
    - label: "完了"
      description: "現在の出力内容で確定する"
    - label: "特定の領域を追加収集"
      description: "指定した領域でさらにリソースを追加検索する"
    - label: "論文の詳細調査へ"
      description: "収集した論文リストをresearch-retrievalスキルで詳しく調査する"
    - label: "別のリソース種別を追加"
      description: "特許やビジネス事例など、別の種別のリソースも収集する"
```

## Output Location

**MUST READ FIRST**: Before deciding the output path, read `docs/research/README.md` (the single source of truth for the research directory layout) and `.claude/rules/research.md`.

### Path resolution

1. Identify the **domain** (`<domain>`, `snake_case`):
   - If the input is a clustering file under `docs/research/runs/<domain>/clustering/...`, use that `<domain>`.
   - Otherwise infer from the research theme or ask the user.
2. Identify the **cluster** (`<cluster>`):
   - From the clustering result section being processed (cluster ID like `metalearner` / `nl2sql-nl2code`).
   - If gathering across all clusters / no cluster context, use `all`.
3. If `docs/research/domains/<domain>/domain.yaml` defines `output_paths.gather`, use it.
4. Otherwise use the default path:

   ```
   docs/research/runs/<domain>/gather/<YYYYMMDD>_<cluster>/
   ```

   - Filename: `resources-<topic-slug>.md` inside this directory.
5. **Never write directly under `docs/research/domains/<domain>/resources/`** — that layer is symlinks.
6. **Never overwrite previous gather runs** — append-only.

### After writing

Update the latest pointer:

```bash
ln -snf <YYYYMMDD>_<cluster> docs/research/runs/<domain>/gather/latest_<cluster>
ln -snf ../../../runs/<domain>/gather/latest_<cluster> docs/research/domains/<domain>/resources/<cluster>
```

## Parallel Processing

Use the Agent tool to spawn subagents for concurrent resource searches across domains and resource types. Each subagent handles one domain × one resource type combination. This significantly speeds up collection when dealing with multiple domains.

## Integration with Other Skills

- **research-clustering** (upstream): Accepts clustering output as input. The cluster names, keywords, and research strategies directly inform search queries.
- **research-retrieval** (downstream): The paper lists generated by this skill can be passed directly to research-retrieval for detailed per-paper analysis. The output format (title + URL table) is designed to be compatible.
- **research-prompt-builder**: Can generate focused research prompts from the collected resource lists.

## Language

- User interactions (AskUserQuestion, etc.) follow the project's response language setting
- Technical terms, paper titles, patent titles, and proper nouns are kept in their original language
- Web searches are conducted in both English and Japanese by default (adjust based on domain)
- **All user-facing output, reports, and summaries must be written in Japanese (すべてのユーザーへの出力は日本語にしてください)**