--- name: bio-causal-genomics-mediation-analysis description: Decompose total effects into direct and indirect paths through mediators using mediation, CMAverse 4-way, HIMA/HIMA2 high-dimensional, BAMA, two-step / MVMR mediation, or double-ML medDML. Use when testing whether a molecular phenotype (expression, methylation, protein) mediates a treatment-outcome relationship, decomposing exposure-mediator interaction via VanderWeele 4-way, screening high-dimensional EWAS mediators, or running MR-based mediation when sequential ignorability is implausible. tool_type: mixed primary_tool: mediation --- ## Version Compatibility Reference examples tested with: R 4.3+, mediation 4.5.0+, CMAverse 0.1.0+ (GitHub `BS1125/CMAverse`), HIMA >= 2.3.0 (CRAN), bama 1.3+, causalweight 1.0.5+ (medDML), MVMR 0.4+, TwoSampleMR 0.6+, EValue 4.1+, gesttools 1.3+, ipw 1.0.11+. Before using code patterns, verify installed versions match. If versions differ: - R: `packageVersion('')` then `?function_name` to verify parameters - HIMA must be pinned at `>= 2.3.0` for the code patterns below; the formula interface `hima(formula, data.pheno, data.M, mediator.type, penalty, ...)` was introduced in 2.3.0. Users on HIMA 2.2.x must instead call `hima_classic()` with positional arguments (`X=`, `Y=`, `M=`, `COV.XM=`, `Y.family=`) and lose the formula interface; HIMA 2.2.x is NOT API-compatible with the examples here. - `hima_classic()` is still available in 2.3+ for the original Zhang 2016 SIS+MCP pipeline; use it only to reproduce 2016-2021 papers. If code throws an error, introspect the installed package (`?hima`, `args(cmest)`) and adapt the example to match the actual API rather than retrying. # Mediation Analysis **"Does expression of GENE_X mediate the SNP-to-disease effect?"** -> Decompose the total effect of a treatment (genotype, exposure) on an outcome into direct and indirect paths through one or more mediators, with explicit handling of exposure-mediator interaction, sensitivity to unmeasured confounding, and high-dimensional mediator screening. - R (single-mediator, observational, sequential-ignorability assumed): `mediation::mediate(med_model, out_model, treat='X', mediator='M', boot=TRUE, sims=5000)` - R (4-way decomposition with exposure-mediator interaction): `CMAverse::cmest(...EMint=TRUE, estimation='paramfunc', inference='bootstrap', nboot=1000)` - R (high-dimensional / EWAS mediators): `HIMA::hima(Y ~ X + covariates, data.pheno, data.M, mediator.type='gaussian', penalty='DBlasso')` (modern v2.3+ formula interface) - R (MR-based mediation): two-step `TwoSampleMR` with independent instruments OR `MVMR::ivw_mvmr` for joint direct effect - R (doubly-robust double-ML): `causalweight::medDML(y, d, m, x)` Sequential ignorability (no unmeasured confounder of treatment-mediator, mediator-outcome, treatment-outcome) is the single load-bearing assumption of observational mediation and is fundamentally untestable. Every report should include a sensitivity result (Imai's rho via `medsens()` or a mediational E-value). ## Algorithmic Taxonomy | Method | Framework | Handles E-M interaction | High-D mediators | Min n | Fails when | |--------|-----------|--------------------------|------------------|-------|------------| | Baron-Kenny (1986) | Additive regression-based product/difference | No | No | ~100 | Any non-linearity, interaction, or binary outcome; deprecated for causal inference | | Imai mediation R (Imai 2010 Psychol Methods) | Counterfactual ACME/ADE with bootstrap | Yes (via interaction term in outcome model) | No | ~200 | Sequential ignorability violated; exposure-induced M-Y confounder; rare binary outcome with logistic outcome model | | VanderWeele 4-way (VanderWeele 2014 Epidemiology 25:749; 2015 OUP) | CDE + PIE + INTref + INTmed decomposition | Native | No | ~300 | Without interaction term reduces to standard mediation; binary outcome needs rare-disease assumption | | CMAverse (Shi 2021 Epidemiology 32:e20) | 6 estimators: regression (`rb`), weighting (`wb`), IORW (`iorw`), natural effect models (`ne`), MSM (`msm`), g-formula (`gformula`) | Yes | No (single M, or M-vector) | ~300 | Estimator-specific; `wb` fails with rare exposure; `msm` needs censoring weights for survival | | HIMA1 / hima_classic (Zhang 2016 Bioinformatics 32:3150) | SIS screen by beta (M->Y) + MCP penalty | No | Yes (up to ~10k) | ~150 + p>>n | Misses mediators with strong alpha and weak beta; screening-step false-negatives | | HIMA2 / hima (Perera 2022 BMC Bioinformatics 23:296) | SIS screen by alpha*beta (indirect effect) + MCP | No (linear by default) | Yes | ~150 | Outcome family limited (gaussian/binomial); HIMA-Cox for survival; HIMA-Pois for count | | HILAMA (Zhang 2025) | High-D mediation with latent confounders | No | Yes (>= 100k) | ~500 | Newer; benchmarks evolving; requires latent-factor specification | | BAMA (Song 2020 Biostatistics) | Bayesian high-D continuous shrinkage | No | Yes (~5k) | ~200 | Slow MCMC; prior sensitivity for very weak mediators | | Two-step MR / network MR (Burgess 2017 Eur J Epidemiol 32:377) | IV-based at each step with INDEPENDENT instruments | Implicit (no interaction modeling) | One mediator at a time | Large summary-stat samples | Same SNP used for E and M (violates exclusion); horizontal pleiotropy; Steiger reversal of M->E direction | | MVMR-mediation (Carter & Sanderson 2021 Eur J Epidemiol 36:465-478) | Total minus direct via MVMR | Implicit | Single mediator | Large GWAS samples for both E and M | Conditional F < 10 for either exposure; correlated instruments | | medDML (Farbmacher 2022 Econometrics J 25:277) | Double-debiased ML, doubly-robust | Limited (depends on learner) | Moderate (sparsity-friendly) | ~500 | Severe overlap violations; cross-fitting variance with small n | Methodology evolves; verify against the current CMAverse vignette and the Steen / Vansteelandt natural-effects-model literature before locking analytic choices. Difference-in-coefficients and product-of-coefficients give identical estimates in fully linear-Gaussian models but DIVERGE for any non-linear outcome model (logistic, Cox, Poisson); the counterfactual ACME from `mediation::mediate()` is the correct quantity for non-linear outcomes. ## 4-Way Decomposition Framework VanderWeele 4-way explicitly separates effects from exposure-mediator interaction: ``` Total Effect = CDE + INTref + INTmed + PIE ``` | Component | Meaning | Active when | |-----------|---------|-------------| | CDE | Controlled direct effect (with mediator fixed at reference level) | E directly affects Y | | INTref | Interaction-reference -- needs interaction AND exposure | E*M interaction with mediator at reference | | INTmed | Mediated interaction -- needs interaction AND exposure AND mediation | E shifts M which then interacts with E | | PIE | Pure indirect effect (older "mediation" quantity) | E shifts M which shifts Y additively | Without an exposure-mediator interaction term, INTref = INTmed = 0 and the decomposition collapses to CDE + PIE (= ADE + ACME). With interaction present, traditional ACME mixes PIE and INTmed; the 4-way separation is the only framework that disentangles them. Most epidemiology applications include the interaction term and report all four components (Valeri & VanderWeele 2013 Psychol Methods). ## Decision Tree by Scenario | Scenario | Recommended pipeline | |----------|---------------------| | Observational, single measured mediator, no plausible E-M interaction, continuous outcome | `mediation::mediate()` with `boot=TRUE, sims=5000`; always run `medsens()` | | Observational, single mediator, suspected E-M interaction, any outcome family | `CMAverse::cmest(..., EMint=TRUE)` -> read CDE, PIE, INTref, INTmed | | Observational, BINARY outcome, rare disease (< 10%) | `cmest(yreg='logistic', EMint=TRUE, casecontrol=FALSE)` -- OR-based 4-way decomposition is valid under rare-disease | | Observational, survival outcome | `cmest(yreg='coxph')` OR `HIMA::hima_cox` for high-D; report HRs | | High-D mediators (EWAS, transcriptome-wide), continuous outcome | `HIMA::hima(formula, data.pheno, data.M, mediator.type='gaussian', penalty='DBlasso')`; report `sigcut` (FDR threshold, default 0.05) | | High-D mediators with latent confounding (very-high-D EWAS) | `HILAMA` (2025) | | High-D mediators with Bayesian shrinkage (small n, ~5k features) | `bama::bama()` | | Strong genetic IVs for exposure available, single mediator with own IVs | Two-step MR with independent instruments + Steiger filter on mediator | | Both E and M have IVs but instruments are weak / correlated | MVMR-mediation with conditional F > 10 each | | Observational with rich confounder set, want doubly-robust estimate | `causalweight::medDML` (double-debiased ML) | | Longitudinal with time-varying confounding | g-formula via `CMAverse::cmest(estimation='gformula')` OR `gfoRmula` package | | Exposure-induced confounder of M-Y exists | Interventional indirect effects (Vansteelandt & Daniel 2017); `CMAverse::cmest(estimation='msm')` | ## Sequential Ignorability and Why It Always Needs Sensitivity Observational mediation requires three no-unmeasured-confounding assumptions. The third (M-Y unmeasured confounder, after conditioning on E) is the most common violator in genomic mediation because biological confounders (cell composition, batch effects, technical mediators) frequently affect both M and Y. ### Sequential ignorability untestable **Trigger:** Always, by design. **Mechanism:** No statistical test can detect an unmeasured confounder of M-Y. Bootstrap CIs assume the assumption holds; they do NOT propagate uncertainty about it. **Symptom:** Significant ACME with no sensitivity reported -> reviewer rejects. **Fix:** Report at least one of: - Imai's rho sensitivity: `medsens(med_result, rho.by=0.05, sims=1000)`; the critical rho where ACME crosses 0; |rho_crit| > 0.3 is "reasonably robust" (Imai 2010), |rho_crit| < 0.1 is highly sensitive. - Mediational E-value (Smith & VanderWeele 2019 Epidemiology 30:835): minimum risk-ratio strength of an unmeasured confounder needed to nullify the observed indirect effect; computed via `EValue::evalues.OLS()` for linear outcomes or by-hand from ACME risk ratio bounds. - Reporting BOTH rho-based and E-value sensitivity is standard for high-stakes claims. ### Methods-Section Defense of Sequential Ignorability Template sentence for the methods write-up: "We assumed sequential ignorability conditional on {age, sex, ancestry PCs, cell composition, batch, smoking}. Robustness was assessed via Imai rho_crit at the ACME contrast (`medsens`, sims = 1000) and the mediational E-value on the risk-ratio scale (Smith & VanderWeele 2019 Epidemiology 30:835)." Quantitative interpretation thresholds: - rho_crit: |rho_crit| > 0.3 robust; 0.1-0.3 moderately sensitive; < 0.1 highly sensitive (Imai 2010). - E-value: E > 2 robust to plausible biological confounding; 1.5-2 moderate; < 1.5 fragile (Smith & VanderWeele 2019). For high-stakes claims (clinical, drug-target, regulatory submissions) report BOTH rho_crit and the mediational E-value; for exploratory work either alone suffices. ### Exposure-induced M-Y confounder **Trigger:** A covariate L sits between E and Y, AND is affected by E, AND confounds M-Y. **Mechanism:** Standard regression-based mediation cannot adjust for L without blocking part of the indirect effect (collider stratification bias). Adjusting biases CDE; not adjusting biases ACME. **Symptom:** Sensitivity to confounder set; ACME flips sign when L is added vs removed. **Operational identification:** From the DAG, L is a covariate of M and Y that is also affected by E. Vansteelandt 2009 (Stat Sci 24:471) gives the criterion: if L is adjusted, part of the indirect E -> L -> M -> Y pathway is blocked; if L is not adjusted, L confounds the M-Y leg. Both are wrong under natural-effects; the natural indirect effect is simply not identified. **Fix:** Switch to **interventional indirect effects** (Vansteelandt & Daniel 2017 Epidemiology 28:258), NOT natural indirect effects. Use `CMAverse::cmest(estimation='msm')` with stabilized inverse-probability weights (yields the randomized-interventional analogue), `gfoRmula` (parametric g-formula), or randomized indirect effects (Lin 2017 Biometrics 73:1109). The interventional indirect is identified under weaker assumptions than the natural indirect. ### HIMA covariate or data.pheno error **Trigger:** `data.pheno` contains factor columns with NA, or formula references columns missing from `data.pheno`. **Mechanism:** HIMA v2.3+ uses a formula interface and constructs the design matrix internally from `data.pheno`; missing values or unparseable formulas surface as cryptic `glmnet` errors. **Symptom:** Pipeline fails inside `hima()` with a non-obvious `storage.mode` or `model.matrix` error. **Fix:** Pre-clean `data.pheno` (drop NA rows for the variables in the formula; convert factors with `factor()`; ensure all RHS variables exist as columns). Example: ```r dat <- na.omit(dat[, c('outcome', 'exposure', 'age', 'sex', 'batch', 'pc1', 'pc2')]) dat$batch <- factor(dat$batch) result <- hima(outcome ~ exposure + age + sex + batch + pc1 + pc2, data.pheno=dat, data.M=M_matrix, mediator.type='gaussian') ``` ### HIMA mediator-type vs outcome-type mismatch **Trigger:** Survival outcome (`Surv()` on LHS) with `mediator.type='compositional'` chosen against text mediator panel; or count mediators passed as `'gaussian'`. **Mechanism:** HIMA v2.3+ auto-detects outcome family from the LHS of `formula` (continuous, binary, survival, count); `mediator.type` is set for the mediator data only (`'gaussian'`, `'negbin'`, `'compositional'`). Mismatching mediator-type to the actual mediator distribution biases the screening step. **Symptom:** Hazard / rate ratios for indirect effects look implausible; many "significant" mediators fail replication. **Fix:** Set `mediator.type='gaussian'` for continuous (e.g., methylation beta, log-CPM expression), `'negbin'` for raw count (RNA-seq), `'compositional'` for relative-abundance microbiome. Verify by `?hima` in the installed version since the catalogue of mediator types has expanded across releases. ### Bootstrap iterations too low **Trigger:** `sims=100` or `sims=500` in early exploration left in for the final report. **Mechanism:** ACME CIs from bootstrap have Monte-Carlo error that scales as 1/sqrt(sims); at sims=500 the 95% CI bounds have ~5% MC noise, enough to flip the conclusion at the boundary. **Symptom:** Re-running `mediate()` with a different `set.seed()` gives substantially different CI bounds. **Fix:** `sims=1000` minimum for any reported result; `sims=5000` for publication; `sims=10000` if proximity to zero matters. BCa CIs (`boot.ci.type='bca'` in `mediate()`) are slightly more accurate than percentile CIs near zero but require more sims for stability. ### Two-step MR instrument independence **Trigger:** Same set of SNPs used as instruments for E in step 1 and for M in step 2. **Mechanism:** If a SNP affects both E and M, the M-instrument violates exclusion restriction (the SNP-Y association is not exclusively through M). Estimates are biased toward the direct effect. **Symptom:** Two-step MR shows large indirect effect; replacing M-instruments with non-overlapping SNPs makes it vanish. **Fix:** Apply Steiger filter on the mediator: keep only SNPs where the SNP-M F-statistic exceeds SNP-E F-statistic (or where SNP explains more variance in M than E). For MVMR-mediation, require conditional F > 10 for each exposure independently (Sanderson 2019 IJE 48:713). ### Difference vs product of coefficients diverge for non-linear outcomes **Trigger:** Binary or survival outcome modeled with logistic / Cox. **Mechanism:** Difference = total - direct; product = alpha * beta. Equivalent under linear-Gaussian; diverge under any link function. Counterfactual ACME from `mediation::mediate()` is the correct quantity; hand-computed product-of-coefficients on logistic output is biased except under rare-disease. **Fix:** Report only counterfactual ACME (Imai or CMAverse). For OR-based 4-way decomposition on rare outcomes (<= 10%), Valeri & VanderWeele 2013 formulas apply; for common outcomes use risk-ratio scale or marginal effects rather than ORs. ### Required Reporting for Publication | Component | Required | |-----------|----------| | ACME estimate + 95% CI (BCa preferred) | Yes | | ADE + 95% CI | Yes | | Total effect | Yes | | Proportion mediated | Yes when total > effect-size threshold | | Bootstrap method + sims | percentile / BCa; min 1000, recommend 5000 | | Sequential ignorability sensitivity | rho_crit (medsens) OR mediational E-value | | Exposure-mediator interaction test | Coefficient + p; 4-way decomposition if significant | | Confounder set justification | DAG description | | Mediator measurement reliability | Cite | | Sample size + missing-data handling | Yes | | Mediator / exposure scale | Standardized? log? raw? | Reference: AGReMA-Mediation guideline (Lee 2021 BMJ 372:n122) and MacKinnon 2008 Introduction to Statistical Mediation Analysis. ## Reconciliation: Observational vs MR Mediation | Pattern | Likely cause | Action | |---------|--------------|--------| | Observational ACME significant; MR-mediation null | Unmeasured M-Y confounding inflated observational estimate; OR weak IVs in MR | Re-run observational with `medsens()`; if rho_crit < 0.1, trust MR null | | Observational ACME null; MR-mediation significant | Measurement error in M attenuated observational estimate | Trust MR (regression-dilution-free) IF instruments pass Steiger and pleiotropy tests (MR-Egger intercept, MR-PRESSO) | | Both significant with same sign | Convergent evidence | High-confidence mediation; report effect size from the more-conservative estimate | | Both significant with opposite signs | At least one is biased; revisit confounder structure and IV assumptions | Do not pool; investigate via cross-method sensitivity | **Operational rule for high-stakes claims (clinical / drug-target mediation):** Require (1) significant observational ACME, (2) Imai rho_crit > 0.2 OR mediational E-value > 1.5, (3) directionally consistent MR-mediation result OR documented absence of valid instruments. Single-method mediation claims should be reported as exploratory. ## Anticipated Reviewer Pushback | Pushback | Standard response | |----------|-------------------| | "Sequential ignorability?" | Imai rho_crit reported via `medsens`; mediational E-value reported on the risk-ratio scale | | "Exposure-induced confounder of M-Y?" | DAG drawn; if L present, switch to `CMAverse::cmest(estimation='msm')` for interventional indirect effect (Vansteelandt & Daniel 2017) | | "Why this bootstrap method?" | BCa with sims=5000 for publication; percentile fallback when BCa fails to converge (acceleration estimate unstable at boundary) | | "Why was MR-mediation not done?" | If valid IVs for E and M exist: two-step MR or MVMR-mediation done (see code below); if not, documented absence of trans-instruments | | "Mediator measured with error?" | Regression calibration (Carroll 2006 Measurement Error in Nonlinear Models) OR sensitivity analysis assuming reliability r = 0.7 (Valeri & VanderWeele 2014) | | "Why HIMA2 not BAMA?" | HIMA2 = frequentist + FDR control + faster; BAMA = Bayesian when prior information is available; sample-size justification given against simulation rule-of-thumb | | "Proportion mediated unstable?" | When |total| < 2*SE(total), proportion-mediated CI is unreliable (denominator near zero); report indirect effect alone with absolute effect size | ## Quantitative Thresholds | Quantity | Threshold | Source / Rationale | |----------|-----------|--------------------| | `sims` (bootstrap iterations) | >= 1000 exploratory, >= 5000 publication | Imai 2010; MC error scales 1/sqrt(sims) | | Proportion mediated -- meaningful | > 0.2 | Convention; weak guideline only -- effect size in absolute terms matters more (MacKinnon 2008) | | Proportion mediated -- "most of the effect" | > 0.5-0.8 | Convention | | Imai rho_crit -- robust | > 0.3 | Imai 2010 Psychol Methods 15:309 | | Imai rho_crit -- sensitive | < 0.1 | Same | | Mediational E-value -- robust | > 2.0 (working convention; the original Smith & VanderWeele 2019 E-value framework does not prescribe a specific cutoff -- magnitude is context-dependent) | Smith & VanderWeele 2019 Epidemiology 30:835 | | HIMA FDR cutoff | BH FDR < 0.05 | Default; report q-values not raw p | | MVMR conditional F per exposure | > 10 each | Sanderson 2019 IJE 48:713 | | Two-step MR -- F for both stages | > 10 each | Burgess weak-instrument convention | | Sample size -- single-mediator (Imai) | >= 200 for stable bootstrap | Simulation rule-of-thumb | | Sample size -- HIMA EWAS | >= 150 with p_mediators up to ~10k | Zhang 2016 simulations | | Rare-outcome cutoff for OR-based 4-way | outcome prevalence <= 10% | Valeri & VanderWeele 2013 | ## Working Code Patterns ### Single-Mediator Observational with Sensitivity **Goal:** Decompose a genotype-disease effect via measured molecular mediator with explicit sensitivity to unmeasured M-Y confounding. **Approach:** Fit mediator and outcome models, bootstrap ACME/ADE, run `medsens()` for Imai rho-based sensitivity. ```r library(mediation) med_model <- lm(expression ~ genotype + age + sex + pc1 + pc2 + pc3, data=dat) out_model <- glm(disease ~ genotype + expression + age + sex + pc1 + pc2 + pc3, data=dat, family=binomial) med_result <- mediate(med_model, out_model, treat='genotype', mediator='expression', boot=TRUE, sims=5000, boot.ci.type='bca') summary(med_result) sens <- medsens(med_result, rho.by=0.05, effect.type='indirect', sims=1000) summary(sens) ``` `d0`, `z0`, `n0`, `tau.coef` slots return ACME, ADE, proportion mediated, total effect. ### 4-Way Decomposition with Exposure-Mediator Interaction **Goal:** Separate CDE, PIE, INTref, INTmed when exposure-mediator interaction is biologically plausible (e.g., gene-environment interaction modifying mediator effect). **Approach:** Use CMAverse regression-based estimator with `EMint=TRUE`; bootstrap CIs. ```r library(CMAverse) result_4way <- cmest( data=dat, model='rb', outcome='disease', exposure='genotype', mediator='expression', basec=c('age','sex','pc1','pc2'), EMint=TRUE, mreg=list('linear'), yreg='logistic', astar=0, a=1, mval=list(0), estimation='paramfunc', inference='bootstrap', nboot=1000 ) summary(result_4way) ``` CMAverse reports the 4-way decomposition (Vanderweele 2014): for continuous outcomes the components are `cde`, `intref`, `intmed`, `pnie` (or `pie`), `te`, `pm`; for non-continuous outcomes (logistic / Cox / Poisson) the ratio versions `Rcde`, `Rpnde`, `Rtnde`, `Rpnie`, `Rtnie` are reported. When `EMint=TRUE`, additional proportion-attributable-to-interaction terms (`int`, `pe`) are included. Verify column names with `summary(result)$results` in the installed CMAverse version, since naming has evolved. ### High-Dimensional EWAS Mediation (HIMA2) **Goal:** Among thousands of candidate CpG mediators, identify those mediating an exposure-outcome effect with FDR control. **Approach:** HIMA v2.3+ uses a formula interface and auto-detects outcome family (Gaussian / binomial / Cox / Poisson). Screening + MCP/DBlasso penalisation + joint significance with BH; the `sigcut` argument controls the FDR threshold (default 0.05). ```r library(HIMA) dat <- na.omit(dat[, c('outcome', 'exposure', 'age', 'sex', 'cell_pc1', 'cell_pc2')]) M_matrix <- as.matrix(beta_values) result <- hima( outcome ~ exposure + age + sex + cell_pc1 + cell_pc2, data.pheno=dat, data.M=M_matrix, mediator.type='gaussian', # 'negbin' for count, 'compositional' for microbiome penalty='DBlasso', # default; alternatives 'MCP', 'SCAD', 'lasso' scale=TRUE, sigcut=0.05, parallel=TRUE, ncore=8, verbose=TRUE ) # result is a data.frame of significant mediators below sigcut ``` For survival outcomes wrap the LHS as `Surv(time, status)`; HIMA auto-routes to Cox. The old `hima_classic()` (Zhang 2016 original) is still exported but screens by beta only and misses mediators with strong alpha + weak beta -- prefer the wrapper `hima()` unless reproducing a 2016-2021 paper. For highly-correlated mediators (CpG-island clusters, gene-module co-expression): HIMA uses joint significance with BH-FDR on max(p_alpha, p_beta) and handles correlation only weakly. Within `hima()`, set `penalty='MCP'` for stronger correlation handling; alternatively pre-reduce the mediator panel by principal components or by clustering correlated mediators and screening the cluster centroid (VanderWeele & Vansteelandt 2014 Epidemiol Methods 2:95). ### Time-Varying Mediation Methods When the mediator is measured at multiple timepoints (or exposure varies over time), natural-effects estimands are not identified; switch to one of: - g-formula (parametric or Monte Carlo): `gfoRmula::gformula_continuous_eof()`; `CMAverse::cmest(estimation='gformula')` - g-estimation of a structural nested mean model: `gesttools::gestSingle()` / `gestMultiple()` - Marginal structural model with stabilized IPTW: `ipw::ipwtm()` followed by `glm(..., weights=sw)` - Sequential mediation for K timepoints: VanderWeele & Tchetgen Tchetgen 2017 JRSSB 79:917 Choose based on the experimental structure: - >= 3 timepoints required for g-methods to identify time-varying indirect effects - Longitudinal mediator measurement at EACH timepoint is required (not just baseline) - MSM is preferred when treatment is binary and time-varying; g-formula when continuous - Sequential mediation when the causal ordering of multiple mediators is known and stable across time ### MR-Mediation: Two-Step vs MVMR-Mediation Decision tree: - Independent instrument sets available for E and M -> two-step MR (Burgess 2017 Eur J Epidemiol 32:377) - E and M share instruments (common in cis-eQTL / cis-pQTL mediator cases) -> MVMR-mediation (Carter & Sanderson 2021 Eur J Epidemiol 36:465) - Both feasible -> report both (triangulation) Two-step code sketch: ```r library(TwoSampleMR) exp_E <- extract_instruments('ieu-a-2', clump=TRUE) m_E <- extract_outcome_data(exp_E$SNP, 'ieu-b-30') dat_EM <- harmonise_data(exp_E, m_E) mr_EM <- mr(dat_EM) exp_M <- extract_instruments('ieu-b-30', clump=TRUE) exp_M_indep <- exp_M[!exp_M$SNP %in% exp_E$SNP, ] exp_M_indep <- steiger_filtering(exp_M_indep) out_M <- extract_outcome_data(exp_M_indep$SNP, 'ieu-a-7') dat_MY <- harmonise_data(exp_M_indep, out_M) mr_MY <- mr(dat_MY) ``` Indirect effect = beta_EM * beta_MY (product of coefficients). CI via delta method or parametric bootstrap of the joint (beta_EM, beta_MY) distribution. Steiger filter on M-instruments is mandatory to ensure the M -> Y direction (not Y -> M). ### MR-Mediation: Total Minus Direct via MVMR **Goal:** Estimate the proportion of a genetic-instrument-identified causal effect that flows through a mediator, using independent IVs for E and (E + M). **Approach:** Univariable MR for total E->Y; MVMR for direct E->Y conditional on M; indirect = total - direct via delta-method CI. ```r library(TwoSampleMR); library(MVMR) total <- mr_ivw(harmonised_data_E_Y) mvmr_dat <- format_mvmr(BXGs=cbind(beta_E, beta_M), BYG=beta_Y, seBXGs=cbind(se_E, se_M), seBYG=se_Y, RSID=snps) fstat <- strength_mvmr(mvmr_dat, gencov=0) mvmr_fit <- ivw_mvmr(mvmr_dat) direct <- mvmr_fit$coef[1, 'Estimate'] direct_se <- mvmr_fit$coef[1, 'Std. Error'] indirect <- total$b - direct indirect_se <- sqrt(total$se^2 + direct_se^2) indirect_ci <- indirect + c(-1.96, 1.96) * indirect_se ``` Require `fstat` conditional F > 10 for both E and M independently. If `fstat < 10`, use Q-statistic-adjusted IVW (`qhet_mvmr`) or report the result as weak-instrument-limited. ### Double-ML Doubly-Robust Mediation **Goal:** Avoid model misspecification of both mediator and outcome models via cross-fitted ML nuisance estimators. **Approach:** `causalweight::medDML` uses random forests (or other learners) with sample splitting to estimate nuisance parameters; final estimator is doubly robust. ```r library(causalweight) result_dml <- medDML( y=dat$outcome, d=dat$treatment, m=dat$mediator, x=as.matrix(dat[, covariates]), trim=0.05, order=1 ) ``` Reports direct, indirect (via mediator), and total effects with influence-function-based standard errors. Robust to non-linearity and interactions; assumes sequential ignorability still. ### Mediational E-Value for Sensitivity **Goal:** Report the minimum strength of an unmeasured M-Y confounder required to nullify the observed indirect effect. **Approach:** Convert ACME and its CI to a risk-ratio scale, then apply VanderWeele E-value formula. ```r library(EValue) acme_rr <- exp(med_result$d0) acme_lower_rr <- exp(med_result$d0.ci[1]) evalues.RR(acme_rr, lo=acme_lower_rr, hi=NULL) ``` For binary outcomes, convert ACME on probability scale to RR; for continuous, use `evalues.OLS()` with the standardized indirect effect. E-value > 2 indicates a confounder would need >2-fold associations with both M and Y to nullify the indirect effect (Smith & VanderWeele 2019). ## Tool Install Notes | Package | Source | Notes | |---------|--------|-------| | mediation | CRAN | `install.packages('mediation')`; actively maintained (Imai group) | | CMAverse | GitHub | `remotes::install_github('BS1125/CMAverse')`; NOT on CRAN; 6 estimators in one interface | | HIMA | CRAN | `install.packages('HIMA')`; v2.x renamed `hima()` to HIMA2 -- verify with `?hima` | | bama | CRAN | `install.packages('bama')`; Bayesian; slow MCMC | | causalweight | CRAN | `install.packages('causalweight')`; medDML for double-ML mediation | | EValue | CRAN | `install.packages('EValue')`; for mediational E-values | | TwoSampleMR | r-universe | See causal-genomics/mendelian-randomization for setup | | MVMR | r-universe | `remotes::install_github('WSpiller/MVMR')`; for MVMR-mediation | | gfoRmula | CRAN | For longitudinal / time-varying confounders | ## Common Errors | Error / symptom | Cause | Solution | |-----------------|-------|----------| | `Error in storage.mode(x) <- "double"` inside `hima()` | NA in `data.pheno` columns referenced by formula, or unconverted factors | `na.omit(data.pheno)` first; ensure all RHS vars in formula are numeric or factor | | ACME significant, ADE significant, total NOT significant | Suppression / inconsistent mediation | Report transparently; effect partitioning can exceed total in suppression | | `medsens()` errors on glm outcome | `medsens` requires linear OR probit (not logit) outcome | Refit outcome as `glm(..., family=binomial(link='probit'))` | | `mediate()` runs forever with binary outcome | `sims=5000` with bootstrap and small n | Use `sims=1000` exploratory; verify model converges first; consider parallel via `parallel='multicore'` | | CMAverse `cmest()` reports NaN for `pm` | Total effect crosses zero -> proportion ill-defined | Report ACME and TE separately; pm is unstable when |TE| is small | | Different ACME between `mediation` and CMAverse `rb` | Default `astar/a` levels differ; binary mediator handled differently | Set `astar=0, a=1` explicitly; for binary mediator pass `mval=list(0)` | | HIMA returns zero significant mediators | Screening too aggressive; or no true mediators | Try `topN=2*sqrt(n)` instead of default; verify with permutation null | | Two-step MR shows indirect > total | Steiger reversal: M actually causes E; or pleiotropic SNPs | Run MR-Steiger filter; use MR-PRESSO for pleiotropy | | `medDML` trim removes most data | Severe positivity violation -- few units with overlapping treatment/mediator distributions | Tighten covariate set; check propensity score distributions | ## References - Baron RM, Kenny DA 1986 J Pers Soc Psychol 51:1173 (original product-of-coefficients) - Imai K, Keele L, Tingley D 2010 Psychol Methods 15:309 (counterfactual mediation, sequential ignorability) - VanderWeele TJ 2014 Epidemiology 25:749 (4-way decomposition) - VanderWeele TJ 2015 Explanation in Causal Inference (OUP) -- canonical textbook - Valeri L, VanderWeele TJ 2013 Psychol Methods 18:137 (binary outcomes; rare-disease 4-way) - Vansteelandt S, Daniel RM 2017 Epidemiology 28:258 (interventional / randomized indirect effects) - Shi B et al 2021 Epidemiology 32:e20 (CMAverse package; 6 estimators) - Zhang H et al 2016 Bioinformatics 32:3150 (HIMA original) - Perera C et al 2022 BMC Bioinformatics 23:296 (HIMA2 alpha-beta screening) - Song Y et al 2020 Biostatistics 21:806 (BAMA Bayesian high-D mediation) - Burgess S et al 2017 Eur J Epidemiol 32:377 (two-step / network MR) - Sanderson E et al 2019 IJE 48:713 (MVMR conditional F-statistic) - Sanderson E et al 2021 IJE 50:1651 (MVMR-mediation tutorial) - Farbmacher H et al 2022 Econometrics J 25:277 (medDML / double-ML mediation) - Smith LH, VanderWeele TJ 2019 Epidemiology 30:835 (mediational E-value) ## Related Skills - causal-genomics/mendelian-randomization - IV-based causal inference; foundation for MR-mediation - causal-genomics/pleiotropy-detection - MR-mediation instrument validity hinges on pleiotropy diagnostics (MR-Egger intercept, MR-PRESSO) - causal-genomics/colocalization-analysis - Confirm shared causal variant before causal mediation - causal-genomics/fine-mapping - Identify the causal variant driving the exposure - methylation-analysis/differential-cpg-testing - Per-CpG inputs for HIMA EWAS mediation - differential-expression/deseq2-basics - Expression inputs for eQTL mediation - multi-omics-integration/mofa-integration - Multi-layer mediator construction - population-genetics/association-testing - GWAS summary statistics for MR-mediation - clinical-biostatistics/effect-measures - Risk-ratio / odds-ratio scales for binary outcomes - machine-learning/model-validation - Cross-fitting and sample splitting for medDML