---
name: bio-causal-genomics-pleiotropy-detection
description: Detect and adjust for horizontal pleiotropy in two-sample Mendelian randomization by distinguishing uncorrelated (UHP) from correlated (CHP) pleiotropy and choosing among Egger, MR-PRESSO, MR-RAPS, CAUSE, LHC-MR, LCV, MR-Clust, MR-Mix, and contamination-mixture methods. Use when validating an MR causal claim, running the STROBE-MR sensitivity battery, suspecting a shared heritable confounder, working under weak-instrument or polygenic-exposure regimes, or reconciling discordant estimates across robust methods.
tool_type: r
primary_tool: TwoSampleMR
---

## Version Compatibility

Reference examples tested with: TwoSampleMR 0.5.11+, MendelianRandomization 0.9.0+, MR-PRESSO 1.0+, CAUSE 1.2.0+, MR-Clust 0.1.0+, MRMix 0.1+, mr.raps 0.4.1+ (GitHub), LHC-MR 0.0.0.9000+ (GitHub), LCV (script-based, no version tag), simex 1.8+.

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- For GitHub-only packages, check the repo HEAD vs the local install date

If code throws errors, introspect the installed package and adapt the example rather than retrying.

# Pleiotropy Detection in Mendelian Randomization

**"Validate my MR result against pleiotropic bias"** -> Decompose violations of the exclusion-restriction assumption into uncorrelated horizontal pleiotropy (UHP, addressable by Egger / median / mode / MR-PRESSO) and correlated horizontal pleiotropy (CHP, addressable only by CAUSE / LHC-MR / LCV), then run a method battery whose assumptions span both regimes.

- R: `TwoSampleMR::mr()` (IVW + Egger + median + mode), `mr_pleiotropy_test()`, `mr_heterogeneity()`, `mr_leaveoneout()`, `directionality_test()`
- R: `MRPRESSO::mr_presso()` for UHP outlier removal + distortion test
- R: `cause::cause()` for CHP-aware estimation; `mrclust::mr_clust_em()` for mechanism-heterogeneous instruments
- R: `MendelianRandomization::mr_conmix()` for contamination mixture; `MRMix::MRMix()` for mixture-of-distributions

## UHP vs CHP: The Central Postdoc-Grade Distinction

Horizontal pleiotropy comes in two regimes, and most "standard" MR sensitivity methods address only one of them.

| Regime | Definition | InSIDE assumption | Methods that handle it |
|--------|------------|--------------------|------------------------|
| UHP (uncorrelated horizontal pleiotropy) | Pleiotropic effect alpha_j independent of instrument-exposure effect gamma_j | Holds | IVW (balanced UHP only), MR-Egger, weighted median, weighted mode, MR-PRESSO, MR-RAPS, MR-Mix, contamination mixture |
| CHP (correlated horizontal pleiotropy) | alpha_j correlates with gamma_j through a shared upstream factor (heritable confounder, network mediator) | Violated | CAUSE, LHC-MR, LCV, MR-Clust (partial), Steiger-filtered MR (partial) |

**InSIDE = INstrument Strength Independent of Direct Effect** (Bowden 2015 IJE 44:512). Plain English: across SNPs, the per-SNP pleiotropic effect alpha and per-SNP instrument-exposure effect gamma are treated as independent random variables. CHP is the case where they covary because both flow from a shared upstream genetic factor.

**The trap (Morrison 2020 Nat Genet 52:740):** IVW, MR-Egger, MR-PRESSO, and GSMR are all blind to CHP. Under a shared heritable confounder they each return a plausible-looking corrected causal estimate that is systematically biased in the direction of the confounder. The MR-PRESSO global test does not flag CHP because correlated pleiotropy is not an outlier pattern, it is a population mean shift in the alpha distribution conditional on gamma.

**Operational rule:** If genetic correlation rg(exposure, outcome) is high (LDSC `>= 0.3`) or biology strongly suggests a shared upstream factor, the IVW / Egger / PRESSO triple is insufficient. Add CAUSE (preferred when sig SNPs `>= 100`) or LHC-MR (preferred for polygenic genome-wide IVs).

## Operational Decision Flow (4 Steps)

1. **Compute genetic correlation (LDSC).** Run `ldsc.py --rg <exposure.sumstats.gz>,<outcome.sumstats.gz>` (see causal-genomics/genetic-correlation). If `|rg| > 0.3`, CHP is plausible -> flag for Step 3 escalation. If the LDSC rg standard error spans zero broadly, treat low-rg evidence as weak rather than confirming absence of CHP.
2. **Standard battery.** IVW (random-effects when Cochran Q p < 0.05) + MR-Egger (with NOME I^2_GX check) + weighted median + weighted mode + MR-PRESSO (NbDistribution `>= 10000` for stringent reporting). Report all five with point estimate, SE, p, 95% CI, and n_SNPs_used. Compute Egger I^2_GX; apply SIMEX if I^2_GX < 0.9 (see examples/simex_egger_correction.R).
3. **CHP escalation.** Trigger when rg > 0.3 OR PRESSO global p < 0.05 with > 50% nominal outliers OR Egger / median / mode disagree by > 2 SE. Run CAUSE (if `>= 100` significant SNPs after pruning) or LHC-MR (any N; uses genome-wide sumstats). Report ELPD delta + z + q (CHP fraction) + gamma (CHP-adjusted causal estimate).
4. **Triangulate.** Pre-MR Steiger filter; bidirectional MR (examples/bidirectional_mr.R); LCV gcp; LDSC rg report. Consensus across methods supports a publication-ready claim. Disagreement requires narrowing the scope (e.g., subgroup, cis-MR, time-varying analysis) rather than reporting a single point estimate.

## Algorithmic Taxonomy

| Method | Models | UHP-robust | CHP-robust | Min #SNPs | Fails when | Citation |
|--------|--------|-----------|-----------|-----------|------------|----------|
| Inverse-variance weighted (IVW) | Weighted regression through origin | Balanced UHP only | No | 3 | Directional UHP; CHP; weak IV bias; heterogeneity | Burgess 2013 Genet Epidemiol 37:658 |
| MR-Egger intercept + slope | IVW + free intercept | Directional UHP | No | >=10 for power | NOME violated (I^2_GX < 0.9); <10 SNPs; CHP | Bowden 2015 IJE 44:512 |
| Weighted median | Median of Wald ratios | Up to 50% invalid | No | >=10 | >50% invalid; CHP | Bowden 2016 Genet Epidemiol 40:304 |
| Weighted mode (MBE) | Mode of estimate density | Plurality valid | Partial | >=10 | Multimodal estimates from CHP clusters | Hartwig 2017 IJE 46:1985 |
| Cochran Q | Heterogeneity across Wald ratios | Total heterogeneity flag, not direction-specific | No | 3 | Cannot distinguish UHP from heterogeneity from CHP | Greco 2015 Stat Med 34:2926 |
| MR-PRESSO | Detect + remove UHP outliers via RSS-out | Yes (assumes majority valid) | No | >=4 | >50% pleiotropic; any CHP; small n | Verbanck 2018 Nat Genet 50:693 |
| GSMR + HEIDI-outlier | Outlier removal via single-instrument estimate heterogeneity | Yes | No | >=10 | CHP (HEIDI-outlier is heterogeneity-driven) | Zhu 2018 Nat Commun 9:224 |
| MR-RAPS | Profile likelihood with overdispersion + Huber/Tukey loss | Yes; weak-IV robust | Partial via overdispersion | >=10 | Strong CHP | Zhao 2020 Ann Stat 48:1742 |
| MR-Mix | Mixture-of-distributions over valid + invalid | Yes | Partial | >=20 | Few SNPs; very heterogeneous CHP | Qi & Chatterjee 2019 Nat Commun 10:1941 |
| Contamination mixture | Profile likelihood over contamination fraction | Yes | Partial | >=20 | Few SNPs | Burgess 2020 Nat Commun 11:376 |
| MR-Clust | k-means over Wald estimates with NULL cluster | Yes | Diagnostic for CHP via clusters | >=20 | Single-mechanism exposure (no clustering signal) | Foley 2020 Bioinformatics 37:531 |
| CAUSE | Bayesian mixture: shared causal + shared-factor (CHP) components | Yes | Yes (explicit) | >=100 sig SNPs at p<5e-8 | <100 sig SNPs; non-overlapping GWAS samples | Morrison 2020 Nat Genet 52:740 |
| LHC-MR | Latent heritable confounder + bidirectional + heritability | Yes | Yes | Genome-wide GWAS sumstats | Heritability mis-estimated; severe sample overlap | Darrous 2021 Nat Commun 12:7274 |
| LCV (latent causal variable) | gcp parameter on genome-wide rg | N/A (not an MR method) | Diagnostic | Genome-wide GWAS sumstats | Heritability low; non-Gaussian effect distribution | O'Connor & Price 2018 Nat Genet 50:1728 |

Methodology evolves; verify against the Burgess & Thompson textbook (2nd ed 2021), Hemani 2018 (basic four-method battery), and Sanderson 2022 Nat Rev Methods Primers 2:6 before locking a sensitivity battery.

## Decision Tree by Scenario

| Scenario | Primary estimator | Sensitivity / triangulation |
|----------|-------------------|----------------------------|
| Many strong IVs, no biological shared trait suspected | IVW + Egger + weighted median + weighted mode + PRESSO | Cochran Q; leave-one-out; F-stat; Steiger filtering |
| LDSC rg(exposure, outcome) `>= 0.3` or strong shared-factor biology | CAUSE (if sig SNPs `>= 100`) OR LHC-MR | LCV gcp; cross-check IVW after Steiger filter |
| Many weak IVs (mean F < 20) | MR-RAPS with overdispersion + robust Huber loss | MR-Mix or contamination mixture; report F-stat range |
| Suspected heterogeneous causal mechanisms (e.g. LDL on CHD via multiple lipoprotein pathways) | MR-Clust; report per-cluster IVW | Pathway annotation of cluster instruments; Bayesian mixture |
| Cis-MR drug target (single locus, few SNPs in LD) | Colocalization (causal-genomics/colocalization-analysis) + Steiger | PWCoCo; conditional analysis; not Egger (low SNP count) |
| Polygenic exposure (heritability spread genome-wide; few significant loci) | LHC-MR (uses all SNPs) | LDSC rg; genome-wide IVW with weak-IV-aware methods (RAPS) |
| Reverse causation suspected | Bidirectional MR with Steiger filter; LHC-MR (jointly estimates both directions) | directionality_test; effect-size r2 comparison |
| Population-level summary discordant with biology | Re-examine instrument selection; check Winner's curse; LD pruning settings | Triangulate with cis-MR; family-based MR if available |

## Per-Method Failure Modes

### MR-Egger NOME violation

**Trigger:** I^2_GX = (Q_GX - df) / Q_GX is below 0.9, indicating measurement-error attenuation of the Egger slope (NOME = "no measurement error" in the exposure GWAS effect sizes).

**Mechanism:** MR-Egger regresses outcome effects on exposure effects with a free intercept. Imprecise exposure effects (high beta.exposure SE relative to beta.exposure variability across instruments) introduce regression dilution that pulls the Egger slope toward the null and inflates the intercept.

**Symptom:** Egger slope much closer to zero than IVW, weighted median, and weighted mode estimates; large Egger SE.

**Fix:** Apply SIMEX correction (Bowden 2016 IJE 45:1961; Cook & Stefanski 1994 JASA 89:1314 SIMEX framework) using the `simex` package on the Egger regression, treating beta.exposure SE as measurement error. See examples/simex_egger_correction.R. Alternative: use MR-RAPS, which models the exposure-effect error explicitly via profile likelihood and does not suffer the NOME failure.

### MR-PRESSO majority-outlier breakdown

**Trigger:** More than 50% of instruments are pleiotropic (UHP), e.g. when instrument set was loosely selected (genome-wide significant but unfiltered).

**Mechanism:** MR-PRESSO's global RSS-out statistic and outlier detection both assume a majority-valid set; outliers are defined relative to that majority. With a pleiotropic majority, PRESSO removes the valid minority.

**Symptom:** PRESSO-corrected estimate is similar in magnitude (and sign) to the uncorrected estimate even after dropping nominally "outlier" SNPs; distortion-test p-value paradoxically non-significant; few or no outliers detected despite obvious global-test significance.

**Fix:** Do not trust PRESSO corrected estimate. Re-examine instrument selection (drop loose p-thresholds, prune LD harder); switch to CAUSE or LHC-MR; consider weighted-mode estimator which is plurality-valid rather than majority-valid.

### MR-PRESSO false negative under CHP

**Trigger:** Strong shared heritable confounder (high rg) producing CHP. Confirmed by significant LDSC rg or LCV gcp.

**Mechanism:** Correlated pleiotropy is a population-level mean shift in alpha conditional on gamma; it is not an outlier pattern. PRESSO's RSS-out distance is invariant under such a mean shift, so the global test is not powered against CHP.

**Symptom:** PRESSO global p > 0.05 (no detected pleiotropy) while a CHP-aware method (CAUSE, LHC-MR) returns a substantially different (often null) causal estimate.

**Fix:** When CHP is plausible, ALWAYS run CAUSE or LHC-MR in addition to PRESSO; do not rely on PRESSO global non-significance as evidence of no pleiotropy.

### MR-Egger underpowered with few SNPs

**Trigger:** Fewer than 10 instruments.

**Mechanism:** Egger's intercept variance is driven by the spread of beta.exposure across instruments; with few SNPs the intercept CI is so wide that even strongly pleiotropic data give non-significant intercepts.

**Symptom:** Non-significant Egger intercept p-value alongside obviously discordant IVW and weighted-median estimates.

**Fix:** Report intercept point estimate and CI rather than a binary "pleiotropy present / absent" verdict; do not use Egger as the only sensitivity method when SNP count is low; weight evidence toward weighted-median, weighted-mode, and CAUSE / LHC-MR.

### Steiger filter inverted by exposure measurement error (Hemani & Tilling 2022 IJE)

**Trigger:** Exposure is imprecisely measured (lower heritability ascertained in the exposure GWAS) and outcome is well-measured.

**Mechanism:** Steiger compares r^2_GX vs r^2_GY per SNP. Measurement error in the exposure underestimates r^2_GX; well-measured outcome captures r^2_GY accurately. Per-SNP, the inequality can flip even when the true causal direction is exposure -> outcome.

**Symptom:** A large fraction of instruments fail Steiger (`steiger_dir == FALSE`) in a direction that conflicts with biological plausibility.

**Fix:** Interpret Steiger as one signal among many, not a hard gate; cross-check with bidirectional MR; verify exposure GWAS heritability and sample size; switch to LHC-MR which models both directions jointly and accounts for heritability.

### CAUSE underpowered with few significant SNPs

**Trigger:** Fewer than 100 genome-wide-significant instruments (p < 5e-8) after harmonization and LD pruning.

**Mechanism:** CAUSE fits a Bayesian mixture model over a shared-factor (CHP) component, a shared-causal component, and a null component. Posterior identification of the mixture weights requires substantial signal across many SNPs.

**Symptom:** CAUSE delta_ELPD CI crosses zero; Pareto-k diagnostic flags unstable points; posterior intervals on q (CHP fraction) span [0, 1].

**Fix:** Use LCV gcp for genome-wide directional inference (does not require many significant SNPs); use LHC-MR if heritability and sumstats are available; or report CAUSE alongside an explicit caveat about its underpowered regime.

### LCV gcp under non-Gaussian effect distributions

**Trigger:** Highly polygenic trait with substantial sparsity in true effects (mixture of large-effect and zero-effect loci).

**Mechanism:** LCV assumes a bivariate normal model for effect sizes after LDSC adjustment. Sparse architectures (e.g. immune traits with HLA dominance) violate this and bias gcp estimates.

**Symptom:** LCV gcp point estimate appears extreme but heritability LDSC z-scores are modest; partitioned heritability shows extreme HLA enrichment.

**Fix:** Exclude HLA region from LDSC inputs; complement with CAUSE / LHC-MR; report gcp with awareness of the polygenicity caveat.

## Quantitative Thresholds

| Metric | Threshold | Source / rationale |
|--------|-----------|--------------------|
| F-statistic per SNP | `>=10` strong; `<10` weak | Burgess 2011 IJE 40:755 (rule of thumb); weak-IV bias toward observational confounded estimate |
| I^2_GX (NOME) | `>=0.9` Egger reliable | Bowden 2016 IJE 45:1961 |
| I^2_GX (NOME) intermediate | 0.6-0.9 SIMEX-corrected Egger | Bowden 2016 IJE |
| I^2_GX (NOME) severe | `<0.6` drop Egger; use MR-RAPS or CAUSE | Bowden 2016 IJE; SIMEX unreliable below 0.6 |
| Egger min SNP count for adequate power | `>=10` | Bowden 2015 IJE 44:512 |
| Cochran Q significance | p < 0.05 indicates heterogeneity | Greco 2015 Stat Med 34:2926 |
| MR-PRESSO NbDistribution | 1000 exploratory; `>=5000` publication; `>=10000` stringent | Verbanck 2018 Nat Genet 50:693 (Methods) |
| MR-PRESSO global test p | < 0.05 -> heterogeneity / outliers present | Verbanck 2018 Nat Genet |
| MR-PRESSO distortion test p | < 0.05 -> outliers materially shifted estimate; if `>= 0.05` report uncorrected IVW | Verbanck 2018 Nat Genet |
| MR-PRESSO min instruments | `>=4` to run; `>=10` for non-degenerate global test | Verbanck 2018 Nat Genet |
| MR-PRESSO SignifThreshold | 0.05 default | Verbanck 2018 Nat Genet |
| MR-PRESSO majority-valid breakdown | Fails when `>50%` instruments pleiotropic | Verbanck 2018 Nat Genet (theoretical limit) |
| Weighted median validity | Robust to `<=50%` invalid IVs | Bowden 2016 Genet Epidemiol 40:304 |
| Weighted mode validity | Plurality-valid (largest valid subset is most common estimate) | Hartwig 2017 IJE 46:1985 |
| CAUSE min #SNPs | `>=100` p < 5e-8 SNPs after pruning | Morrison 2020 Nat Genet 52:740 (Supplement) |
| CAUSE delta_ELPD criterion | one-sided p < 0.05; z = delta_elpd / se(delta_elpd); z > 1.96 standard; z > 3.0 stringent | Morrison 2020 Nat Genet |
| LDSC rg suggesting CHP | `>= 0.3` flags need for CAUSE / LHC-MR | Operational rule; see causal-genomics/genetic-correlation |
| Steiger r^2 difference | Reverse-causal flag at any per-SNP r2_GY > r2_GX | Hemani 2017 PLoS Genet 13:e1007081 |
| Standard sensitivity battery | IVW + Egger + median + mode + PRESSO + Steiger + LOO | Hemani 2018 eLife 7:e34408 / STROBE-MR 2021 |

LCV gcp interpretation thresholds (0, 0.5, 0.6, 1) are tabulated in usage-guide.md.

## Standard Sensitivity Battery (Working Reference)

**Goal:** Run the canonical UHP-focused MR sensitivity suite on harmonized two-sample data.

**Approach:** Compute IVW + Egger + median + mode side-by-side; test Egger intercept and heterogeneity; run MR-PRESSO with `>=5000` distributions for publication or `>=10000` for stringent reporting; apply Steiger filter; leave-one-out; report all estimates.

```r
library(TwoSampleMR)
library(MRPRESSO)

methods <- c('mr_ivw', 'mr_egger_regression', 'mr_weighted_median', 'mr_weighted_mode')
res_mr <- mr(dat, method_list = methods)

het <- mr_heterogeneity(dat)
pleio <- mr_pleiotropy_test(dat)
loo <- mr_leaveoneout(dat)
steiger <- directionality_test(dat)

isq <- Isq(dat$beta.exposure, dat$se.exposure)
nome_pass <- isq >= 0.9

presso <- mr_presso(
    BetaOutcome='beta.outcome', BetaExposure='beta.exposure',
    SdOutcome='se.outcome', SdExposure='se.exposure',
    OUTLIERtest=TRUE, DISTORTIONtest=TRUE,
    data=dat, NbDistribution=10000, SignifThreshold=0.05)

global_p <- presso$`MR-PRESSO results`$`Global Test`$Pvalue
outlier_p <- presso$`MR-PRESSO results`$`Outlier Test`$Pvalue
distortion_p <- presso$`MR-PRESSO results`$`Distortion Test`$Pvalue
n_outliers <- sum(outlier_p < 0.05, na.rm=TRUE)
```

Full working pipeline incl SIMEX, MR-RAPS, contamination mixture, and STROBE-MR table: examples/sensitivity_battery.R.

## CAUSE for CHP-Aware Estimation

**Goal:** Distinguish causal from shared-factor (correlated horizontal pleiotropy) explanations of an exposure-outcome association.

**Approach:** Fit nuisance parameters (LD pruning + rho_GWAS sample-overlap correction) on a random SNP set; fit the sharing and causal posterior; compare ELPD (expected log predictive density) via Pareto-k smoothed importance sampling.

```r
library(cause)
params <- est_cause_params(dat_cause, variants = pruned_subset_snps)
res_cause <- cause(X = dat_cause, variants = pruned_snps, param_ests = params)
elpd <- summary(res_cause)$tab
```

Full posterior extraction + reporting: examples/cause_analysis.R.

**Interpreting CAUSE output:**

- `q`: posterior CHP fraction; 0 = no CHP, 1 = all instruments operate via the shared factor
- `eta`: shared-factor effect on Y (the "confounder pathway" magnitude)
- `gamma`: posterior causal effect of E on Y after partialling out CHP; report median + 95% credible interval
- `delta_ELPD` (sharing - causal): negative -> causal model preferred; z = delta_elpd / se(delta_elpd); z > 1.96 standard, z > 3.0 stringent; one-sided p reported alongside posterior gamma
- Pareto-k > 0.7 indicates unstable posterior on those points; if more than 10% of points are unstable, treat the posterior as unreliable; remediation: add more SNPs (loosen p-threshold one notch then re-prune in LD) or re-fit excluding flagged outliers

CAUSE requires sumstats from both exposure and outcome GWAS in matched effect-allele coding. The pruning step typically retains 100-5000 signature SNPs at LD r^2 < 0.01 in a 1 Mb window; nuisance estimation should use a larger random SNP subset (`>= 100,000` genome-wide SNPs) to fit rho (sample overlap) stably.

## MR-RAPS Loss Function and Overdispersion

**Trigger:** Weak instruments (mean F < 20) and/or suspected UHP requiring outlier-resistant estimation.

- `over.dispersion = TRUE` always for MR (horizontal-pleiotropy variance is real, not noise; turning this off underestimates SE)
- `loss.function = 'huber'` (default; outlier-resistant; suited to mild to moderate UHP)
- `loss.function = 'tukey'` (more aggressive; downweights extreme outliers more; choose when many obvious outliers suspected)
- `loss.function = 'l2'` (non-robust; equivalent to weighted least squares; do not use when UHP suspected)

Tukey is preferable when leave-one-out reveals 2+ SNPs single-handedly shifting the IVW estimate by > 1 SE.

## MR-Clust for Mechanism Heterogeneity

**Goal:** When a single causal estimate is misleading because instruments operate through multiple causal mechanisms (e.g. LDL on CHD via multiple lipoprotein subfractions), identify clusters of instruments with similar per-SNP Wald ratios.

```r
library(mrclust)
ratio_hat <- dat$beta.outcome / dat$beta.exposure
ratio_se <- abs(dat$se.outcome / dat$beta.exposure)
res_mc <- mr_clust_em(theta=ratio_hat, theta_se=ratio_se,
                     bx=dat$beta.exposure, by=dat$beta.outcome,
                     bxse=dat$se.exposure, byse=dat$se.outcome,
                     obs_names=dat$SNP)
per_cluster <- res_mc$results$best
```

Clusters with cluster_class = 'null' are pleiotropy-only instruments. Per-cluster IVW estimates may differ substantially; biological annotation of the SNPs in each cluster (pathway, target gene) is the interpretation step.

## LHC-MR Workflow

**Goal:** Jointly estimate forward causal effect, reverse causal effect, and the heritable-confounder contribution from genome-wide sumstats (not just significant SNPs).

```r
library(lhcMR)
ld <- list(ld_score_path='ldsc/eur_w_ld_chr/', hapmap_snp_path='ldsc/w_hm3.snplist')
sp_list <- sp_estimate(trait.df=list(X=df_x, Y=df_y), ld=ld)
res_lhc <- lhc_mr(sp_list, sample_overlap=overlap_matrix, n_cores=4)
```

LHC-MR is computationally heavy (hours on full sumstats) but among the most rigorous CHP-aware estimators when both GWAS are well-powered. Output includes axx, ayy, hxy (confounder effect on each trait), and bidirectional alpha_xy, alpha_yx.

**Choosing CAUSE vs LHC-MR (Darrous 2021):**

| Condition | Preferred method |
|-----------|------------------|
| `>= 100` genome-wide significant SNPs after pruning | CAUSE (Bayesian; CHP-explicit; mature posterior diagnostics) |
| Polygenic exposure with few significant loci | LHC-MR (uses genome-wide signal, not just significant SNPs) |
| Severe sample overlap between exposure and outcome GWAS | LHC-MR (jointly models overlap); CAUSE's rho correction is exposed to misspecification at high overlap |
| Bidirectionality of central interest | LHC-MR (jointly estimates alpha_xy and alpha_yx); CAUSE only models forward |
| Limited compute / quick turnaround | CAUSE (minutes to hours); LHC-MR may be > 24h on full sumstats |

When both apply, report both with the agreement / disagreement explicit in the discussion.

## Bidirectional MR Procedure

1. **Forward MR:** instrument exposure E, test effect on outcome Y (primary)
2. **Reverse MR:** instrument outcome Y, test effect on exposure E (using outcome-direction instruments)
3. **Steiger pre-filter both directions:** `steiger_filtering(dat)`; drop SNPs where outcome r^2 > exposure r^2 before primary IVW
4. **Compare estimates:** null reverse + significant forward strengthens the forward causal claim; bidirectional significance flags feedback / shared confounder / reciprocal causation
5. **LCV gcp orthogonal check:** genome-wide directional inference independent of the instrument set

Working code: examples/bidirectional_mr.R.

**Interpretation cheat-sheet:**

| Forward p | Reverse p | Reading |
|-----------|-----------|---------|
| significant | non-significant | Forward causal claim strengthened |
| non-significant | significant | Re-examine instrument-exposure assignment; the "outcome" may causally drive the "exposure" |
| significant | significant | Feedback loop, shared confounder, or reciprocal causation; resolve with LHC-MR |
| non-significant | non-significant | No evidence of causation in either direction |

When forward and reverse both clear Steiger and both IVW p < 0.05, run LHC-MR jointly rather than reporting two univariable estimates.

## LCV (Latent Causal Variable)

LCV uses LDSC-merged genome-wide sumstats and reports gcp (genetic causality proportion) on [-1, 1]. It is a complement to, not a replacement for, MR; gcp ~ 0 with high LDSC rg implies pure genetic correlation without partial causation.

```r
source('LCV/R/RunLCV.R')
res_lcv <- RunLCV(ldscores$L2, x$Z, y$Z)
# res_lcv$gcp; res_lcv$pval.gcpzero.2tailed
```

Full gcp interpretation table is in usage-guide.md.

## Required Supplementary Tables

**Instrument table** (one row per SNP retained for primary analysis):

| Column | Content |
|--------|---------|
| rsID, chr, pos | Variant identifier and genome position |
| EA, OA, EAF | Effect allele, other allele, effect allele frequency in exposure GWAS |
| beta_E, se_E, p_E, F | Exposure-side estimate, SE, p-value, per-SNP F-statistic |
| beta_Y, se_Y, p_Y | Outcome-side estimate, SE, p-value (harmonized to EA) |
| harmonization_action | 1=kept, 2=flipped, 3=dropped (palindromic ambiguity) |
| palindromic_flag | TRUE / FALSE; tracked per Hartwig 2016 IJE 45:1717 |
| Steiger_direction | forward / reverse / inconclusive |
| Steiger_p | per-SNP directionality p-value |

**Sensitivity-battery table** (one row per method):

| Column | Content |
|--------|---------|
| method | IVW / Egger / WM / WMode / PRESSO (raw + corrected) / RAPS / CAUSE |
| estimate, se, p, 95% CI | Point estimate and inference |
| n_SNPs_used | Post-harmonization, post-Steiger SNP count |
| heterogeneity_p | Q for IVW; Q' for Egger; global p for PRESSO |
| intercept_p | Egger only (directional UHP test) |
| ELPD_delta + z | CAUSE only (sharing - causal; negative + |z| > 1.96 -> causal) |

## Reconciliation Across Methods

| Pattern | Likely cause | Action |
|---------|--------------|--------|
| IVW and Egger agree (small Egger intercept); median and mode agree | Likely true causal; minimal pleiotropy | Report all; STROBE-MR; emphasize agreement |
| IVW significant; Egger non-significant with similar slope; PRESSO global p > 0.05 | Egger underpowered (few SNPs) OR Egger NOME violated | Check I^2_GX; SIMEX-correct if 0.6 < I^2_GX < 0.9 |
| IVW shifted relative to Egger / median / mode; Egger intercept significant | Directional UHP | Trust Egger slope; PRESSO-corrected IVW; mode estimator |
| IVW + Egger + median + mode + PRESSO all agree but CAUSE delta_ELPD non-significant or in opposite direction | CHP via shared confounder | Trust CAUSE; report all five UHP methods alongside but flag the discordance; check LDSC rg |
| All UHP methods agree; LCV gcp ~ 0 | Genetic correlation only, not causation | Causation evidence is weak; report rg explicitly; consider colocalization (cis-MR) instead |
| MR-Clust shows >=2 distinct non-null clusters | Heterogeneous mechanisms | Report per-cluster estimates; do not summarize as a single effect |
| Steiger fails on a substantial fraction of instruments | Reverse causation OR exposure measurement error | Run bidirectional MR; check exposure GWAS heritability; LHC-MR |
| MR-PRESSO global p < 0.05 but corrected estimate similar to uncorrected | >50% pleiotropic OR CHP masquerading as UHP | Re-prune instruments; switch to weighted-mode / CAUSE / LHC-MR |

**Operational rule for publication:** Report IVW (primary), Egger slope + intercept, weighted median, weighted mode, MR-PRESSO global + distortion + corrected, Cochran Q, Steiger directionality, F-statistic distribution, I^2_GX (for Egger validity), and at least one CHP-aware method (CAUSE or LHC-MR) when rg `>= 0.3` or biology suggests shared upstream. Failure to report a CHP-aware result when CHP is plausible is a reviewer-flagged red flag since 2020.

## Anticipated Reviewer Pushback

| Pushback | Standard response |
|----------|-------------------|
| "Was CHP checked for?" | LDSC rg reported (causal-genomics/genetic-correlation); if rg > 0.3, CAUSE or LHC-MR ran; q posterior reported |
| "Why CAUSE and not LHC-MR?" | CAUSE preferred when `>= 100` significant SNPs available (Morrison 2020). LHC-MR preferred when significant-SNP set is small or polygenic, using genome-wide sumstats (Darrous 2021) |
| "Egger NOME?" | I^2_GX computed; if 0.6 <= I^2_GX < 0.9, SIMEX correction applied; if < 0.6, Egger dropped in favor of MR-RAPS |
| "PRESSO doesn't catch CHP?" | Confirmed (Morrison 2020); CAUSE / LHC-MR reported alongside PRESSO for that reason |
| "Steiger filter applied pre-MR or post?" | Pre-MR: SNPs failing per-SNP Steiger directionality dropped before primary IVW |
| "Why no replication cohort?" | Two-sample design uses independent exposure and outcome cohorts; if same biobank, MRlap used or noted as a limitation |
| "Why not just trust the IVW?" | IVW assumes balanced UHP and no CHP; both violated routinely; sensitivity battery is the standard since STROBE-MR 2021 |
| "Effect size is implausibly large" | Re-examine F-statistic distribution for weak IV bias; check Winner's curse; consider Wald ratio at a single strong instrument as sanity check |

## STROBE-MR Reporting (Skrivankova 2021)

| Item | Required content |
|------|-----------------|
| 1-3 | Title / abstract / background indicates this is an MR study; pre-registered protocol |
| 4-7 | Study design, data sources, instrument selection criteria (p-threshold, LD clumping, MAF) |
| 8-11 | Harmonization, palindrome handling, allele alignment |
| 12-14 | F-statistic distribution; weak-instrument bias mitigation |
| 15-17 | Primary MR method + all sensitivity methods + CHP-aware method when relevant |
| 18-19 | Pleiotropy tests, Steiger, heterogeneity |
| 20 | Discussion of remaining assumption violations; limitations |

Sub-items (30 total) detail per-method reporting. The full statement (JAMA 326:1614) and explanation (BMJ 375:n2233) are now reviewer-required at most cardiovascular and psychiatric journals since 2022.

## Common Errors

| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| MR-PRESSO crashes with `Not enough intrumental variables` | Fewer than 4 SNPs | Need >=4 for PRESSO; for cis-MR with few SNPs use colocalization |
| Egger intercept p < 0.05 but I^2_GX = 0.5 | NOME violated; intercept is artifactually inflated | SIMEX-correct or do not trust Egger; use MR-RAPS instead |
| `Isq()` not found | TwoSampleMR version where Isq is unexported | Compute manually: Q_GX = sum((beta_GX/se_GX)^2); I2 = (Q_GX - (n-1))/Q_GX, clipped to [0,1] |
| MR-RAPS `package not found` after CRAN install | CRAN-archived 2025-03-01 | `remotes::install_github('qingyuanzhao/mr.raps')`; call via `TwoSampleMR::mr_raps()` wrapper (MendelianRandomization does NOT export `mr_raps`) |
| CAUSE delta_ELPD CI spans zero; Pareto-k > 0.7 | <100 sig SNPs OR severe sample overlap | Use LHC-MR; or report CAUSE with the explicit caveat |
| Steiger labels most instruments reverse-causal | Exposure GWAS imprecise OR sample size mismatch | Treat as one signal; cross-check with bidirectional MR |
| LHC-MR runtime > 24h | Default n_cores=1 on full sumstats | Use n_cores >= 4; restrict to LDSC-overlapping SNPs first |
| MR-PRESSO outliers all on same chromosome | Genome-wide LD not properly pruned; clumping window too narrow | Re-clump at r^2 < 0.001 in 10 Mb window |
| MR-Mix returns NA | Few SNPs OR no variation in mixture support | Need >=20 SNPs; default mixture grid may need tuning |

## References

- Bowden J et al 2015 Int J Epidemiol 44:512 (MR-Egger; InSIDE)
- Bowden J et al 2016 Int J Epidemiol 45:1961 (NOME, I^2_GX, SIMEX)
- Cook JR & Stefanski LA 1994 JASA 89:1314 (SIMEX framework)
- Burgess S 2020 Nat Commun 11:376 (contamination mixture)
- Burgess S & Thompson SG 2021 (Mendelian Randomization 2nd ed)
- Darrous L et al 2021 Nat Commun 12:7274 (LHC-MR)
- Foley CN et al 2020 Bioinformatics 37:531 (MR-Clust)
- Hartwig FP et al 2017 Int J Epidemiol 46:1985 (weighted mode)
- Hemani G et al 2017 PLoS Genet 13:e1007081 (Steiger orientation)
- Hemani G et al 2018 eLife 7:e34408 (TwoSampleMR framework)
- Hemani G & Tilling K 2022 Int J Epidemiol (Steiger limitations under measurement error)
- Morrison J et al 2020 Nat Genet 52:740 (CAUSE; CHP)
- O'Connor LJ & Price AL 2018 Nat Genet 50:1728 (LCV)
- Sanderson E et al 2022 Nat Rev Methods Primers 2:6 (MR Primer)
- Skrivankova VW et al 2021 JAMA 326:1614 (STROBE-MR statement); BMJ 375:n2233 (explanation)
- Verbanck M et al 2018 Nat Genet 50:693 (MR-PRESSO)
- Zhao Q et al 2020 Ann Stat 48:1742 (MR-RAPS)

## Related Skills

- causal-genomics/mendelian-randomization - Primary causal estimation that this sensitivity battery validates
- causal-genomics/genetic-correlation - LDSC rg required for Step 1 of the decision flow; CHP escalation trigger
- causal-genomics/colocalization-analysis - Required for cis-MR drug-target signals where instruments are too few for Egger
- causal-genomics/fine-mapping - Identify causal variants underlying instrument loci
- causal-genomics/mediation-analysis - Multivariable MR for mediator-adjusted causal estimates
- population-genetics/association-testing - GWAS summary statistics underlying MR instruments
- clinical-biostatistics/effect-measures - Translate MR estimates to clinical effect measures
