---
name: clinicaltrials-database-search
description: Query ClinicalTrials.gov API v2 for trial data. Search by condition, drug/intervention, location, sponsor, or phase; fetch details by NCT ID; filter by status; paginate; export CSV. For clinical research, patient matching, and trial portfolio analysis.
license: CC-BY-4.0
---

# ClinicalTrials.gov Database — Clinical Trial Search

## Overview

Query the ClinicalTrials.gov API v2 (public, no authentication) to search and retrieve clinical trial data worldwide. Supports searching by condition, intervention, location, sponsor, and status; retrieving detailed study information by NCT ID; paginating large result sets; and exporting to CSV.

## When to Use

- Searching for recruiting clinical trials for a specific condition or disease
- Finding trials testing a specific drug, device, or intervention
- Locating trials in a specific geographic region for patient referral
- Tracking a sponsor's or institution's clinical trial portfolio
- Retrieving detailed eligibility criteria, outcomes, and contacts for a specific trial
- Analyzing clinical trial trends (phases, enrollment, timelines) across a therapeutic area
- Exporting trial data for systematic reviews or meta-analyses
- Monitoring trial status changes and results postings
- For chemical compound bioactivity data use chembl-database-bioactivity instead; for published literature use pubmed-database

## Prerequisites

```bash
uv pip install requests pandas
```

**API details**:
- Base URL: `https://clinicaltrials.gov/api/v2`
- Authentication: None required (public API)
- Rate limit: ~50 requests/minute per IP
- Response formats: JSON (default), CSV
- Max page size: 1000 studies per request
- Date format: ISO 8601; text fields use CommonMark Markdown

## Quick Start

```python
import requests
import time

CT_API = "https://clinicaltrials.gov/api/v2"

def ct_search(params):
    """Reusable helper for ClinicalTrials.gov searches."""
    response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
    response.raise_for_status()
    return response.json()

# Search for recruiting breast cancer trials
results = ct_search({
    "query.cond": "breast cancer",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 10,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} trials")
for study in results['studies'][:3]:
    nct = study['protocolSection']['identificationModule']['nctId']
    title = study['protocolSection']['identificationModule']['briefTitle']
    print(f"  {nct}: {title}")
```

## Key Concepts

### Response Data Structure

ClinicalTrials.gov returns deeply nested JSON. Key navigation paths:

| Data | Path |
|------|------|
| NCT ID | `study['protocolSection']['identificationModule']['nctId']` |
| Title | `study['protocolSection']['identificationModule']['briefTitle']` |
| Status | `study['protocolSection']['statusModule']['overallStatus']` |
| Phase | `study['protocolSection']['designModule']['phases']` |
| Enrollment | `study['protocolSection']['designModule']['enrollmentInfo']['count']` |
| Eligibility | `study['protocolSection']['eligibilityModule']` |
| Locations | `study['protocolSection']['contactsLocationsModule']['locations']` |
| Interventions | `study['protocolSection']['armsInterventionsModule']['interventions']` |
| Results | `study.get('resultsSection')` (None if no results posted) |

### Study Status Values

| Status | Description |
|--------|-------------|
| `RECRUITING` | Currently recruiting participants |
| `NOT_YET_RECRUITING` | Approved but not yet open |
| `ENROLLING_BY_INVITATION` | Invitation-only enrollment |
| `ACTIVE_NOT_RECRUITING` | Active, enrollment closed |
| `SUSPENDED` | Temporarily halted |
| `TERMINATED` | Stopped prematurely |
| `COMPLETED` | Study concluded |
| `WITHDRAWN` | Withdrawn before enrollment |

### Study Phase Values

| Phase | Description |
|-------|-------------|
| `EARLY_PHASE1` | Early Phase 1 (formerly Phase 0) |
| `PHASE1` | Phase 1 — safety and dosing |
| `PHASE2` | Phase 2 — efficacy and side effects |
| `PHASE3` | Phase 3 — large-scale efficacy |
| `PHASE4` | Phase 4 — post-market surveillance |
| `NA` | Not applicable (non-drug studies) |

### Query Parameters Reference

| Parameter | Type | Description | Example |
|-----------|------|-------------|---------|
| `query.cond` | string | Condition/disease | `lung cancer` |
| `query.intr` | string | Intervention/drug | `Pembrolizumab` |
| `query.locn` | string | Geographic location | `New York` |
| `query.spons` | string | Sponsor name | `National Cancer Institute` |
| `query.term` | string | General full-text search | `immunotherapy` |
| `filter.overallStatus` | string | Status filter (comma-separated) | `RECRUITING,COMPLETED` |
| `filter.phase` | string | Phase filter | `PHASE2,PHASE3` |
| `filter.ids` | string | NCT ID filter | `NCT04852770` |
| `sort` | string | Sort order | `LastUpdatePostDate:desc` |
| `pageSize` | int | Results per page (max 1000) | `100` |
| `pageToken` | string | Pagination token | (from previous response) |
| `format` | string | Response format | `json` or `csv` |

**Sort options**: `LastUpdatePostDate`, `EnrollmentCount`, `StartDate`, `StudyFirstPostDate` — each with `:asc` or `:desc`.

## Core API

### 1. Search by Condition

```python
results = ct_search({
    "query.cond": "type 2 diabetes",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 20,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} recruiting diabetes trials")
for study in results['studies'][:5]:
    proto = study['protocolSection']
    nct = proto['identificationModule']['nctId']
    title = proto['identificationModule']['briefTitle']
    print(f"  {nct}: {title}")
```

### 2. Search by Intervention/Drug

```python
# Find Phase 3 trials testing Pembrolizumab
results = ct_search({
    "query.intr": "Pembrolizumab",
    "filter.overallStatus": "RECRUITING,ACTIVE_NOT_RECRUITING",
    "filter.phase": "PHASE3",
    "pageSize": 50
})
print(f"Phase 3 Pembrolizumab trials: {results['totalCount']}")
```

### 3. Search by Location

```python
results = ct_search({
    "query.cond": "cancer",
    "query.locn": "New York",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 20
})

# Extract location details
for study in results['studies'][:3]:
    locs = study['protocolSection'].get('contactsLocationsModule', {}).get('locations', [])
    for loc in locs:
        if 'New York' in loc.get('city', ''):
            print(f"  {loc.get('facility')}: {loc['city']}, {loc.get('state', '')}")
```

### 4. Search by Sponsor

```python
results = ct_search({
    "query.spons": "National Cancer Institute",
    "pageSize": 20
})

for study in results['studies'][:5]:
    sponsor_mod = study['protocolSection']['sponsorCollaboratorsModule']
    lead = sponsor_mod['leadSponsor']['name']
    collabs = [c['name'] for c in sponsor_mod.get('collaborators', [])]
    print(f"  Lead: {lead}, Collaborators: {collabs}")
```

### 5. Retrieve Study Details by NCT ID

```python
nct_id = "NCT04852770"
response = requests.get(f"{CT_API}/studies/{nct_id}", timeout=30)
response.raise_for_status()
study = response.json()

# Extract key information
proto = study['protocolSection']
print(f"Title: {proto['identificationModule']['briefTitle']}")
print(f"Status: {proto['statusModule']['overallStatus']}")

# Eligibility criteria
elig = proto.get('eligibilityModule', {})
print(f"Ages: {elig.get('minimumAge')} - {elig.get('maximumAge')}")
print(f"Sex: {elig.get('sex')}")
print(f"Criteria:\n{elig.get('eligibilityCriteria', 'N/A')[:300]}")
```

### 6. Pagination for Large Result Sets

```python
all_studies = []
page_token = None
max_pages = 10

for page in range(max_pages):
    params = {
        "query.cond": "cancer",
        "filter.overallStatus": "RECRUITING",
        "pageSize": 1000,
    }
    if page_token:
        params["pageToken"] = page_token

    results = ct_search(params)
    all_studies.extend(results['studies'])
    page_token = results.get('nextPageToken')

    if not page_token:
        break
    time.sleep(1.5)  # respect rate limits

print(f"Retrieved {len(all_studies)} studies across {page + 1} pages")
```

### 7. Export to CSV

```python
response = requests.get(f"{CT_API}/studies", params={
    "query.cond": "heart disease",
    "filter.overallStatus": "RECRUITING",
    "format": "csv",
    "pageSize": 1000
}, timeout=60)

with open("heart_disease_trials.csv", "w") as f:
    f.write(response.text)
print("Exported to heart_disease_trials.csv")
```

## Common Workflows

### Workflow 1: Multi-Criteria Trial Discovery

```python
import requests, time

CT_API = "https://clinicaltrials.gov/api/v2"

def ct_search(params):
    response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
    response.raise_for_status()
    return response.json()

# Step 1: Search with multiple filters
results = ct_search({
    "query.cond": "lung cancer",
    "query.intr": "immunotherapy",
    "query.locn": "California",
    "filter.overallStatus": "RECRUITING,NOT_YET_RECRUITING",
    "pageSize": 100,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Total matches: {results['totalCount']}")

# Step 2: Filter by phase
phase23 = [
    s for s in results['studies']
    if any(p in ['PHASE2', 'PHASE3']
           for p in s['protocolSection'].get('designModule', {}).get('phases', []))
]
print(f"Phase 2/3 trials: {len(phase23)}")

# Step 3: Extract summaries
for study in phase23[:5]:
    proto = study['protocolSection']
    nct = proto['identificationModule']['nctId']
    title = proto['identificationModule']['briefTitle']
    enrollment = proto.get('designModule', {}).get('enrollmentInfo', {}).get('count', 'N/A')
    print(f"  {nct}: {title} (n={enrollment})")
```

### Workflow 2: Completed Trials with Results Analysis

```python
# Step 1: Find completed trials with posted results
results = ct_search({
    "query.cond": "alzheimer disease",
    "filter.overallStatus": "COMPLETED",
    "pageSize": 100,
    "sort": "LastUpdatePostDate:desc"
})

with_results = [s for s in results['studies'] if s.get('hasResults', False)]
print(f"Completed with results: {len(with_results)} / {len(results['studies'])}")

# Step 2: Get detailed results for top trial
if with_results:
    nct = with_results[0]['protocolSection']['identificationModule']['nctId']
    detail = requests.get(f"{CT_API}/studies/{nct}", timeout=30).json()

    if 'resultsSection' in detail:
        outcomes = detail['resultsSection'].get('outcomeMeasuresModule', {})
        measures = outcomes.get('outcomeMeasures', [])
        for m in measures[:3]:
            print(f"  Outcome: {m.get('title')}")
            print(f"  Type: {m.get('type')}")
```

### Workflow 3: Sponsor Portfolio Comparison

```python
sponsors = ["Pfizer", "Novartis", "Roche"]
for sponsor in sponsors:
    results = ct_search({
        "query.spons": sponsor,
        "filter.overallStatus": "RECRUITING",
        "pageSize": 1
    })
    print(f"{sponsor}: {results['totalCount']} recruiting trials")
    time.sleep(1.5)
```

## Common Recipes

### Recipe: Rate-Limited Bulk Search

```python
def ct_search_with_retry(params, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                wait = 60
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
        except requests.exceptions.RequestException:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)
    raise Exception("Max retries exceeded")
```

### Recipe: Extract Study Summary

```python
def extract_summary(study):
    proto = study.get('protocolSection', {})
    ident = proto.get('identificationModule', {})
    status = proto.get('statusModule', {})
    design = proto.get('designModule', {})
    return {
        'nct_id': ident.get('nctId'),
        'title': ident.get('officialTitle') or ident.get('briefTitle'),
        'status': status.get('overallStatus'),
        'phases': design.get('phases', []),
        'enrollment': design.get('enrollmentInfo', {}).get('count'),
        'last_update': status.get('lastUpdatePostDateStruct', {}).get('date')
    }

# Usage
for study in results['studies'][:3]:
    s = extract_summary(study)
    print(f"{s['nct_id']}: {s['status']} | Phase: {s['phases']} | n={s['enrollment']}")
```

### Recipe: Safe Field Navigation

```python
def safe_get(study, *keys, default='N/A'):
    """Navigate nested study JSON safely."""
    current = study
    for key in keys:
        if isinstance(current, dict):
            current = current.get(key)
        else:
            return default
        if current is None:
            return default
    return current

# Usage — handles missing fields gracefully
nct = safe_get(study, 'protocolSection', 'identificationModule', 'nctId')
phases = safe_get(study, 'protocolSection', 'designModule', 'phases', default=[])
enrollment = safe_get(study, 'protocolSection', 'designModule', 'enrollmentInfo', 'count')
```

## Key Parameters

| Parameter | Endpoint | Default | Description |
|-----------|----------|---------|-------------|
| `query.cond` | search | — | Condition/disease search term |
| `query.intr` | search | — | Intervention/drug search term |
| `query.locn` | search | — | Geographic location filter |
| `query.spons` | search | — | Sponsor/organization filter |
| `query.term` | search | — | General full-text search |
| `filter.overallStatus` | search | all | Comma-separated status values |
| `filter.phase` | search | all | Comma-separated phase values |
| `pageSize` | search | 10 | Results per page (max 1000) |
| `sort` | search | relevance | `{field}:{asc\|desc}` |
| `format` | both | `json` | `json` or `csv` |
| `timeout` | (client) | 30s | Set in requests call |

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| 429 Too Many Requests | Rate limit exceeded (~50/min) | Wait 60s; use max `pageSize=1000`; implement exponential backoff |
| Empty studies array | No trials match filters | Broaden search (remove status/phase filters); check spelling |
| 400 Bad Request | Invalid parameter value | Verify status/phase values match enumeration exactly (e.g., `RECRUITING` not `recruiting`) |
| Missing `resultsSection` | Trial has no posted results | Check `study['hasResults']` before accessing results |
| KeyError on nested field | Not all trials have all modules | Use `.get()` with defaults or `safe_get` helper (see Recipes) |
| Pagination stops early | `nextPageToken` absent | All results retrieved; check `totalCount` vs collected count |
| CSV format differs from JSON | Different field structure | CSV flattens nested structure; use JSON for programmatic access |
| Timeout on large exports | CSV with many results | Increase timeout; paginate with `pageSize=1000` instead |

## Best Practices

- **Use maximum page size** (1000) for bulk retrieval to minimize request count against rate limit
- **Always check `hasResults`** before accessing `resultsSection` — most trials have no posted results
- **Navigate safely** with `.get()` chains — not all trials populate all modules (especially `contactsLocationsModule`, `armsInterventionsModule`)
- **Specify multiple status values** with commas (e.g., `RECRUITING,NOT_YET_RECRUITING`) — don't make separate requests per status
- **Use `sort=LastUpdatePostDate:desc`** by default — returns most recently updated trials first
- **Date interpretation**: `lastUpdatePostDateStruct.date` is ISO 8601 string; `type` field indicates `ACTUAL` vs `ESTIMATED`

## Related Skills

- `pubmed-database` — Published literature search complementary to trial registry data
- `chembl-database-bioactivity` — Compound bioactivity data for drugs under investigation
- `bioservices-multi-database` — Alternative database access via unified Python interface

## References

- ClinicalTrials.gov API documentation: https://clinicaltrials.gov/data-api/api
- API migration guide (v1→v2): https://clinicaltrials.gov/data-api/about-api/api-migration
- ClinicalTrials.gov homepage: https://clinicaltrials.gov/
- OpenAPI specification: https://clinicaltrials.gov/data-api/about-api/api-spec

## Bundled Resources

**Self-contained entry**. Original total: 866 lines (SKILL.md 507 + api_reference.md 359). Scripts: 216 lines (query_clinicaltrials.py).

**Original file disposition**:
- `SKILL.md` (507 lines) → Core API modules 1-7 (condition, intervention, location, sponsor, details, pagination, CSV export). "Core Capabilities" sections 1-10 consolidated: Search by Condition → Module 1, Search by Intervention → Module 2, Geographic Search → Module 3, Search by Sponsor → Module 4, Retrieve Detailed Study → Module 5, Pagination → Module 6, Data Export → Module 7, Combined Query → Workflow 1, Extract Summary → Recipe. "Resources" section stub → removed, content consolidated inline. Per-use-case disposition: Patient Matching → When to Use bullet + Workflow 1; Research Analysis → When to Use + Workflow 2; Drug Tracking → When to Use + Module 2; Geographic Search → Module 3; Sponsor Tracking → Module 4 + Workflow 3; Data Export → Module 7; Trial Monitoring → When to Use bullet; Eligibility Screening → Module 5
- `references/api_reference.md` (359 lines) → Fully consolidated inline: endpoint parameters → Key Concepts "Query Parameters Reference" table; status/phase values → Key Concepts tables; response structure → Key Concepts "Response Data Structure" table; HTTP error codes → Troubleshooting table; rate limit guidance → Prerequisites + Best Practices; use cases → duplicated main SKILL.md examples, absorbed into Core API; data standards (ISO 8601, CommonMark) → Prerequisites note. Error handling patterns → Recipes "Rate-Limited Bulk Search"
- `scripts/query_clinicaltrials.py` (216 lines) → Helper function pattern: `search_studies()` → Quick Start `ct_search()` helper; `get_study_details()` → Module 5 inline; `search_with_all_results()` → Module 6 pagination pattern; `extract_study_summary()` → Recipe "Extract Study Summary". Thin-wrapper shortcut applied — each function was a thin wrapper around requests.get()

**Retention**: ~465 lines / 866 original (excl. scripts) = ~54%.
