---
name: bio-clinical-biostatistics-logistic-regression
description: Performs logistic regression for clinical trial outcomes (binary, ordinal, multinomial) with marginal-vs-conditional estimand reporting per FDA 2023 covariate adjustment guidance, g-computation/standardisation for marginal effects, modified Poisson for RR, Brant test for proportional odds, Firth penalty for separation, and Hauck-Donner detection. Use when modeling binary or ordinal endpoints in confirmatory or exploratory clinical trials.
tool_type: python
primary_tool: statsmodels
---

## Version Compatibility

Reference examples tested with: statsmodels 0.14+, scipy 1.12+, numpy 1.26+, pandas 2.1+, firthmodels 0.3+, marginaleffects 0.0.13+ (Python) / 0.20+ (R). R packages cited: RobinCar, marginaleffects, brant, MASS, VGAM.

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name`

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

# Logistic Regression for Clinical Outcomes

**"Model clinical outcomes with logistic regression"** -> Estimate the marginal or conditional treatment effect on a binary or ordinal endpoint using a model that respects randomisation stratification, declares its estimand, and survives covariate misspecification.

**Conditional vs marginal (the non-collapsibility subtlety):** the OR is non-collapsible. The conditional OR from logistic regression is a *different parameter* than the marginal OR, even when there is NO confounding and randomisation is perfect. This is mathematical, not statistical bias. FDA 2023 favours marginal RD (via g-computation) for primary reporting to avoid parameter ambiguity. See clinical-biostatistics/effect-measures and Permutt 2020.

## Algorithmic Taxonomy

| Approach | Estimand | Inference | Strength | Fails when |
|----------|----------|-----------|----------|------------|
| Unadjusted logistic / chi-square | Marginal OR | Wald or LR | Simple; transparent | Loses efficiency vs adjusted (Senn 2013); inflates SE under stratified randomisation (Kahan-Morris 2012) |
| Logistic with covariates (ML, Wald CI) | **Conditional** log-OR | Wald | Standard; widely available | Conditional OR != marginal OR due to non-collapsibility (Permutt 2020); not the FDA 2023 primary estimand |
| Logistic + g-computation / standardisation | **Marginal** RD/RR/OR | Influence function or bootstrap SE | FDA 2023 recommended primary estimand for binary | Requires correct outcome model AND post-fit standardisation; needs robust SE machinery |
| Targeted Maximum Likelihood (TMLE) | Marginal RD/RR/OR | Influence function | Provably efficient; doubly robust in observational | Implementation heavier; mostly R (`tmle`, `tmle3`); rare in confirmatory submissions |
| Modified Poisson with sandwich SE | Marginal RR | HC1/HC3 sandwich | Direct RR estimation when prevalence >10% | Slightly less efficient than log-binomial when log-binomial converges |
| Log-binomial regression | Marginal RR | Wald | Direct RR estimation | Frequent convergence failure when predicted risk near 1 |
| Firth penalised logistic | Conditional OR (penalised) | Penalised LR test preferred | Handles separation, rare events (<5% prevalence) | Wald CI/p liberal; must use PLR test (Heinze-Schemper 2002) |
| Ordinal logistic (proportional odds) | Common conditional OR across cut-points | Wald or LR | Preserves ordering information | Proportional odds assumption violation (Brant test) |
| Partial proportional odds | PO holds for some covariates, not others | Hybrid | Salvages ordinal model when PO fails for one predictor | Increased complexity; harder interpretation |
| Multinomial logistic | Per-category log-OR | Wald | No PO assumption needed | Loses efficiency; harder communication |

**Postdoc reading list:** Permutt 2020 *Stat Biopharm Res* 12:45 (conditional vs marginal estimand); FDA May 2023 Final Guidance "Adjusting for Covariates in RCTs" (the regulatory rulebook); Tsiatis et al 2008 *Stat Med* 27:4658 (robust ANCOVA framework); Senn 2013 *Stat Med* 32:1439 (precision from adjustment even under perfect balance); Kahan-Morris 2012 *Stat Med* 31:328 (must adjust for stratification factors).

## Decision Tree by Scenario

| Scenario | Recommended approach | Why |
|----------|---------------------|-----|
| RCT, binary primary endpoint, no covariate adjustment in SAP | Unadjusted logistic with marginal OR; chi-square supportive | Simple regulatory case; consider whether covariate adjustment would gain power (Senn 2013) |
| RCT, binary primary, covariates pre-specified, FDA 2023 compliant | **Logistic adjusted + g-computation for marginal RD**; conditional OR supportive | FDA 2023 final: marginal estimand for primary; mention HC3 sandwich SE |
| RCT, stratified randomisation (sex/region/severity) | Include strata as covariates in logistic; the analytic decision is non-optional | Kahan-Morris 2012: ignoring strata inflates Type-I error up to 30% |
| Common outcome (prevalence >10%) where RR is the policy quantity | Modified Poisson + HC1/HC3 OR log-binomial regression | Avoid OR misinterpretation; Zou 2004; cite EMA 2015 on covariate adjustment |
| Rare event (<5% prevalence) or separation observed | Firth penalty + penalised LR test for p-value | Heinze-Schemper 2002; Wald p from Firth is liberal |
| Ordinal outcome, PO assumption supportable | Proportional odds; verify with Brant test (R `brant::brant`) | Most efficient when PO holds; common in toxicity grading and PROs |
| Ordinal outcome, PO fails on one or two predictors | Partial PO model (R `VGAM::vglm(..., cumulative(parallel=FALSE~X))`) | Salvages most efficiency; only the offending coefficient gets per-cut-point estimate |
| Ordinal outcome, PO fails widely | Multinomial logistic | Worst case; loses ordering info but valid |
| Single arm observational with strong confounding | Logistic adjusted + g-computation + propensity weighting | Doubly robust; cite Hernan-Robins; consider TMLE |
| Binary endpoint, longitudinal repeated measures | GEE or generalised linear mixed model | Sandwich SE for GEE; mixed-model OR is subject-specific not marginal |

## Standard Workflow

**Goal:** Fit a covariate-adjusted logistic regression for a binary clinical endpoint with explicit reference category and pre-specified covariates.

**Approach:** Use the formula API for automatic categorical handling and explicit reference; report both the conditional log-OR and the marginal RD via g-computation.

```python
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

# CRITICAL: set reference category explicitly. statsmodels default is alphabetical,
# so 'Active' sorts before 'Placebo' and the OR direction silently flips.
model = smf.logit(
    'outcome ~ C(ARM, Treatment(reference="Placebo")) + age + C(sex) + baseline_score',
    data=df
).fit()

# Conditional ORs (Wald)
or_table = pd.DataFrame({
    'OR': np.exp(model.params),
    'Lower_CI': np.exp(model.conf_int()[0]),
    'Upper_CI': np.exp(model.conf_int()[1]),
    'p_value': model.pvalues
})

# Marginal RD via g-computation (the FDA 2023-recommended primary estimand):
df_active = df.assign(ARM='Active')
df_placebo = df.assign(ARM='Placebo')
risk_active = model.predict(df_active).mean()
risk_placebo = model.predict(df_placebo).mean()
marginal_rd = risk_active - risk_placebo
# For SE, use influence-function or bootstrap; see RobinCar / marginaleffects in R
```

**The reference-category trap is the single most common silent bug.** `smf.logit('y ~ C(ARM)')` uses alphabetical ordering, so `ARM='Active'` becomes 1 and `ARM='Placebo'` becomes 0 -- but with `ARM=['Active','Placebo','Control']`, 'Active' is the reference and 'Placebo' is the comparison. Always pass `Treatment(reference="Placebo")` or equivalent. R `glm(y ~ relevel(ARM, ref='Placebo'))`.

## Marginal vs Conditional Estimand -- The FDA 2023 Pivot

**The single most important methodological shift in clinical biostatistics 2020-2025.** Permutt 2020 established and FDA 2023 codified: the maximum-likelihood coefficient on Z in `glm(Y ~ Z + X, family=binomial)` is the **conditional log-OR** -- the OR comparing Z=1 to Z=0 holding X fixed. Due to OR non-collapsibility, this is a *different parameter* than the **marginal log-OR** (the OR comparing the entire treated population to the entire control population), even under perfect randomisation.

**The FDA 2023 final guidance** ("Adjusting for Covariates in Randomized Clinical Trials," May 2023) endorses covariate-adjusted nonlinear-model analysis *provided the analyst targets a marginal estimand*. Reporting the conditional OR from a multivariable logistic regression as "the treatment effect" silently changes the estimand from what was pre-specified.

### G-computation / Standardisation

```python
# Marginal effects on three scales via g-computation:
df_z1 = df.assign(ARM='Active')
df_z0 = df.assign(ARM='Placebo')
p_z1 = model.predict(df_z1)
p_z0 = model.predict(df_z0)

marg_p1 = p_z1.mean()
marg_p0 = p_z0.mean()
marg_rd = marg_p1 - marg_p0
marg_rr = marg_p1 / marg_p0
marg_or = (marg_p1 / (1 - marg_p1)) / (marg_p0 / (1 - marg_p0))
```

**For valid SE:** the delta-method/influence-function variance and HC3 sandwich are required. Python `marginaleffects` v0.0.13+ implements this via `marginaleffects.avg_comparisons(model, variables='ARM', vcov='HC3')`. R is more mature: `marginaleffects::avg_comparisons` (Arel-Bundock-Greifer-Heiss 2024 *JSS* 111:9), `RobinCar` (purpose-built for FDA 2023), `riskCommunicator`.

**Tsiatis et al 2008 robustness guarantee:** under randomisation Z ⊥ X, the g-computation marginal estimator is *consistent* for the marginal ATE even if the outcome model is misspecified. Efficiency depends on model quality; consistency does not. This is why FDA 2023 accepts the marginal RD via g-computation without requiring proof of correct logistic mean structure.

**Reporting template (post-FDA-2023):**

> "Primary estimand: marginal risk difference of -8.5 percentage points (95% CI -12.7 to -4.2, HC3 SE), computed by g-computation/standardisation from a logistic regression adjusted for age, sex, and baseline severity. Supportive: conditional OR 0.48 (95% CI 0.34-0.67, Wald) from the same model."

## Modified Poisson for Common Outcomes -- Direct RR

When prevalence > 10% and the policy quantity is RR (not OR), modified Poisson with sandwich SE is the de facto modern standard (Zou 2004 *AJE* 159:702):

```python
import statsmodels.api as sm
import numpy as np

X = sm.add_constant(df[['treatment', 'age', 'baseline_score']])
poisson_model = sm.GLM(df['outcome'], X, family=sm.families.Poisson()).fit(cov_type='HC1')
rr = np.exp(poisson_model.params)
rr_ci = np.exp(poisson_model.conf_int())
```

`cov_type='HC1'` corrects the over-dispersion that Poisson inherently assumes. Without it the variance is wrong because binary data are NOT Poisson. For n < 250, HC3 is preferred (Long-Ervin 2000). **What postdocs argue about:** modified Poisson vs log-binomial -- log-binomial directly models RR but frequently fails to converge when predicted risk approaches 1; modified Poisson always converges but is marginally less efficient when log-binomial works.

## Proportional Odds and Brant Test

**Proportional odds (PO)** assumption: the effect of each predictor is constant across all cut-points of the ordinal outcome. Must be tested or the model parameters are invalid.

```python
from statsmodels.miscmodels.ordinal_model import OrderedModel
from statsmodels.api import MNLogit
import pandas as pd

df['severity'] = pd.Categorical(df['severity'], categories=['mild', 'moderate', 'severe'], ordered=True)

po_model = OrderedModel.from_formula('severity ~ treatment + age', data=df, distr='logit').fit(method='bfgs', disp=0)
mn_model = MNLogit.from_formula('severity ~ treatment + age', data=df).fit(disp=0)

# LR test for PO vs MN (omnibus)
lr_stat = 2 * (mn_model.llf - po_model.llf)
lr_df = mn_model.df_model - po_model.df_model
from scipy.stats import chi2
lr_p = 1 - chi2.cdf(lr_stat, lr_df)

# Per-coefficient Brant test in R: brant::brant(po_model_object)
# The Brant test localises which predictor(s) violate PO -- essential before deciding on partial PO
```

**The omnibus LR test is necessary but not sufficient.** Brant test (Brant 1990 *Biometrics* 46:1171; R `brant::brant`) provides per-coefficient PO tests, identifying *which* predictor violates PO. If only one or two predictors violate PO, fit a **partial proportional odds model** (R `VGAM::vglm(..., cumulative(parallel=FALSE~X1+X2))`) rather than abandoning the model entirely.

**OrderedModel intercept gotcha:** do NOT add an intercept term. Threshold parameters (cut-points between ordinal levels) replace the intercept. An explicit constant causes non-identifiability and optimiser failure.

## Separation and Firth Penalty

**Detection:**

```python
from firthmodels import FirthLogisticRegression, detect_separation
import numpy as np

sep_result = detect_separation(X, y)
if sep_result.separation:
    print(sep_result.summary())
# Manual signs: coefficient > 10, SE > 100, convergence warnings
```

**Firth penalty (Firth 1993 *Biometrika* 80:27):**

```python
firth = FirthLogisticRegression()
firth.fit(X, y)
or_firth = np.exp(firth.coef_)

# IMPORTANT: Wald p-values from Firth are LIBERAL (anti-conservative).
# Prefer the penalised likelihood-ratio test (PLRT).
# Some Python `firthlogist` releases expose a PLRT attribute (e.g. `pvalues_lrt_`)
# but its presence and name vary by release -- check `dir(firth)` against the
# installed version, otherwise compute the PLRT manually from the penalised
# log-likelihoods of nested models.
```

**Heinze-Schemper 2002 *Stat Med* 21:2409** showed Wald inference from Firth is **liberal** (anti-conservative); the penalised likelihood-ratio test (PLRT) is the recommended inference. PLRT attributes (e.g. `pvalues_lrt_`) appear in some Python Firth packages but the exact attribute name varies by release -- introspect the installed package; compute the PLRT manually if no attribute is exposed:

```python
def penalised_lrt(firth_full, firth_reduced):
    # Compute 2 * (penalised log-lik full - penalised log-lik reduced) ~ chi-square_df
    pass  # implementation depends on package version; see Heinze-Schemper 2002
```

Firth's method was originally designed for finite-sample bias reduction, not separation per se -- it adds the Jeffreys prior penalty to the likelihood, keeping coefficients finite under separation as a side effect. Also recommended for rare events (<5% prevalence) where ML bias is non-negligible.

## Hauck-Donner Effect Detection

**The Hauck-Donner effect (1977 *JASA* 72:851; revived by Yee 2022 *JASA* 117:1763):** the Wald test statistic is non-monotonic in the parameter estimate near the boundary. A large log-OR can produce a small Wald chi-square -- so the Wald test fails to reject when LR/profile-likelihood would. Common in small samples with strong predictors.

```python
# In R, detect with VGAM::hdeff() and replace Wald with profile likelihood:
# MASS::confint.glm(model) returns profile-likelihood CIs as default
# Python equivalent: bootstrap or manual profile likelihood
```

**When to suspect Hauck-Donner:** large coefficient magnitude with non-significant Wald p; large SE with finite estimate; switching from Wald to LR test changes significance. The fix is always: switch to profile likelihood or LR inference.

## Reconciliation: When Methods Disagree

| Pattern | Likely cause | Action |
|---------|--------------|--------|
| Conditional OR from logistic vs marginal RD from g-computation give different conclusions | Non-collapsibility (OR is non-collapsible; RD is collapsible) | Report marginal RD as primary per FDA 2023; conditional OR as supportive with explicit parameter label; cite Permutt 2020 |
| ML logistic diverges (large coefficient, huge SE); Firth penalised converges | Complete or quasi-complete separation in covariate-outcome relationship | Firth penalty with penalised LR test (NOT Wald p-values); cite Heinze-Schemper 2002 |
| Modified Poisson and log-binomial give different RR estimates | Log-binomial convergence fragility when predicted risk approaches 1 | Modified Poisson with sandwich SE preferred for robust convergence; cite Zou 2004 |
| Proportional-odds (cumulative logit) and multinomial give different inferences | PO assumption violated for one or more predictors (Brant test rejects) | Localise via Brant test; if 1-2 predictors violate, partial-PO model in VGAM; if widely, multinomial |
| Adjusted vs unadjusted treatment effect differ substantially | Confounding (in observational) OR non-collapsibility (in RCT) | In RCT, non-collapsibility expected (Permutt 2020); in observational, investigate confounding via DAG |
| Large OR with non-significant Wald p-value | Hauck-Donner effect: Wald non-monotone near boundary (Yee 2022) | Use profile-likelihood CI (R `MASS::confint.glm`); detect via `VGAM::hdeff()` |
| Stratified analysis significant; unstratified not | Achieved SE smaller in stratified analysis | Include stratification factors in primary model (Kahan-Morris 2012); ignoring inflates Type-I |
| Significant treatment effect on conditional model vanishes when mediator included | Adjusting for post-treatment variable on causal pathway | Never adjust for mediators in primary; use causal DAG to distinguish confounder vs mediator |

## Per-Method Failure Modes

### Reference-category silent reversal

- **Trigger:** Treatment variable coded as character with `Active` and `Placebo`; reference not explicitly set.
- **Mechanism:** statsmodels defaults to alphabetical ordering; 'Active' becomes the reference.
- **Symptom:** OR for "ARM[T.Placebo]" appears in output; direction is the opposite of intended.
- **Fix:** Always pass `C(ARM, Treatment(reference="Placebo"))` or numeric encoding with explicit comment.

### Adjusting for a mediator

- **Trigger:** Including a post-randomisation variable that lies on the causal pathway.
- **Mechanism:** Adjustment blocks the causal effect, attenuating treatment effect estimate toward null.
- **Symptom:** Significant unadjusted effect becomes non-significant after adjustment for "mechanism" variable (e.g., inflammation marker).
- **Fix:** Use causal DAG to distinguish confounder from mediator; never adjust for post-treatment variables in the primary analysis. See ICH E9(R1).

### Hauck-Donner non-monotonicity

- **Trigger:** Small samples, strong predictor, cell counts approaching zero.
- **Mechanism:** Wald test statistic is non-monotone in the parameter near the boundary.
- **Symptom:** Large OR magnitude with non-significant Wald p; switching to LR test changes significance.
- **Fix:** Use profile likelihood (R `MASS::confint.glm`) or LR inference; detect with `VGAM::hdeff()`.

### Complete or quasi-complete separation

- **Trigger:** A predictor perfectly predicts outcome in a subset of the data.
- **Mechanism:** Likelihood is monotone in that coefficient; ML estimate diverges to infinity.
- **Symptom:** Coefficient >10, SE >100, convergence warning; convergence message says "fitted probabilities numerically 0 or 1."
- **Fix:** Firth penalty (`firthmodels`); inference via penalised LR test, not Wald.

### Proportional odds violation undetected

- **Trigger:** Ordinal outcome fit with cumulative-logit (PO) model without testing assumption.
- **Mechanism:** PO assumes a constant effect across cut-points; violation invalidates model parameters.
- **Symptom:** Brant test rejects PO for one or more predictors; per-cut-point effects in a saturated model diverge.
- **Fix:** Brant test first; if PO fails on one predictor, partial PO model; if widely fails, multinomial logistic.

### Stratified randomisation not in analysis

- **Trigger:** Stratification by site/region/severity at randomisation; analysis ignores strata.
- **Mechanism:** Achieved SE smaller than calculated SE because randomisation removed between-stratum variability.
- **Symptom:** Anti-conservative Type-I error; Kahan-Morris 2012 documents up to 30% inflation in published trials.
- **Fix:** Include stratification factors as covariates in the logistic; this is non-optional per ICH E9, FDA 2023, EMA 2015.

## Model Diagnostics

| Diagnostic | Method | Threshold | Caveat |
|-----------|--------|-----------|--------|
| Discrimination | ROC-AUC (`sklearn.metrics.roc_auc_score`) | >0.7 acceptable, >0.8 good | Trial-level, not patient-individual; depends on outcome prevalence |
| Calibration -- primary | Calibration plot (observed vs predicted in deciles) | Curve along the diagonal | Visual; preferred over H-L |
| Calibration -- secondary | Hosmer-Lemeshow chi-square | p > 0.05 | Low power n<200; oversensitive n>2000 |
| Pseudo R-squared | model.prsquared (McFadden) | >0.2 excellent | NOT comparable to OLS R-squared |
| Events per variable | n_events / n_covariates | >=10 EPV (Peduzzi 1996) | Below this: bias, overfitting; consider Firth |

### Hosmer-Lemeshow

```python
from scipy.stats import chi2
import pandas as pd

def hosmer_lemeshow(y_true, y_pred, n_groups=10):
    df_hl = pd.DataFrame({'y': y_true, 'prob': y_pred})
    df_hl['group'] = pd.qcut(df_hl['prob'], n_groups, duplicates='drop')
    grouped = df_hl.groupby('group').agg(obs=('y', 'sum'), n=('y', 'count'), pred=('prob', 'mean'))
    grouped['expected'] = grouped['n'] * grouped['pred']
    hl_stat = (((grouped['obs'] - grouped['expected']) ** 2) /
               (grouped['n'] * grouped['pred'] * (1 - grouped['pred']))).sum()
    actual_groups = len(grouped)
    return hl_stat, 1 - chi2.cdf(hl_stat, actual_groups - 2)
```

**H-L is supplementary, not primary:** Hosmer-Lemeshow has low power n<200 (rarely rejects even for poor calibration) and oversensitive n>2000 (rejects for trivial miscalibration). The decile choice is also arbitrary -- `pd.qcut(..., duplicates='drop')` can reduce the actual number of groups when probabilities tie, changing df. Use calibration plots (smoothed observed vs predicted) as primary; Hosmer-Lemeshow as supplementary p-value.

## Quantitative Thresholds

| Threshold | Source | Rationale |
|-----------|--------|-----------|
| 10 events per variable (EPV) | Peduzzi et al 1996 *J Clin Epidemiol* 49:1373 | Below this, coefficient bias and CI mis-coverage |
| Prevalence > 10% -> prefer marginal RR via modified Poisson | Zou 2004 *AJE* 159:702 | OR overstates RR; modified Poisson directly estimates RR |
| Marginal estimand for primary regulatory analysis | FDA 2023 Final Guidance | Conditional OR is a different parameter than marginal (Permutt 2020) |
| Brant test for PO assumption | Brant 1990 *Biometrics* 46:1171 | Omnibus LR test misses per-coefficient violations |
| Firth penalty with penalised LR test, not Wald | Heinze-Schemper 2002 *Stat Med* 21:2409 | Wald is liberal under Firth penalty |
| HC3 sandwich SE for n <=250 | Long-Ervin 2000 *Am Stat* 54:217 | HC1 (Stata default) anti-conservative in small samples |
| Stratification factors in analysis when used in randomisation | Kahan-Morris 2012 *Stat Med* 31:328 | Ignoring inflates Type-I error up to 30% |

## Common Errors

| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| OR direction opposite of expected | Alphabetical reference; 'Active' sorts before 'Placebo' | `C(ARM, Treatment(reference="Placebo"))` always |
| Conditional and marginal OR differ substantially | Non-collapsibility, not confounding | Cite Permutt 2020; report both with explicit labels |
| Treatment effect vanishes after adjustment | Adjusted for a mediator (post-treatment variable) | Causal DAG check; never adjust for post-treatment in primary |
| Huge coefficient, huge SE, non-significant Wald | Separation OR Hauck-Donner | Detect separation: `firthmodels.detect_separation`; switch to Firth + PLRT |
| H-L p > 0.5 with obvious miscalibration | Low power at n<200 | Use calibration plot as primary; H-L supplementary |
| H-L p < 0.001 with great-looking plot | Oversensitive at n>2000 | Use calibration plot; cite Steyerberg 2019 for cautions |
| `OrderedModel` fails to converge with intercept | Threshold parameters replace intercept | Drop the explicit constant |
| `firth.pvalues_` looks too low | Wald p from Firth is liberal | Inspect the installed Firth package for a PLRT attribute (e.g. `pvalues_lrt_`); compute PLRT manually if none is exposed |
| GLM(family=Binomial) and Logit give same point but different output | `sm.Logit` provides `pred_table()`, `prsquared`; `sm.GLM` does not | Use Logit unless changing link functions |
| Robust SE not reported for modified Poisson | Missing `cov_type='HC1'` or 'HC3' | Always specify sandwich SE for modified Poisson |

## Anticipated Reviewer Pushback

| Pushback | Response |
|----------|----------|
| "Why is the marginal RD different from the conditional OR's implied RD?" | Non-collapsibility of OR; cite Permutt 2020 and FDA 2023. Marginal RD via g-computation is the primary per FDA 2023. |
| "Adjustment for stratification factors?" | Pre-specified strata included in the model. Cite Kahan-Morris 2012 and ICH E9. |
| "How were covariates chosen?" | Pre-specified in the SAP based on prior literature/clinical knowledge. No data-driven selection (which would inflate Type-I). |
| "Why not use stepwise selection?" | Stepwise leaks signal from the outcome and inflates Type-I. Pre-specification is the regulatory norm; cite FDA 2023. |
| "PH check for the longitudinal binary?" | Binary doesn't have PH; if longitudinal, used GEE with sandwich SE or GLMM; see ICH E9 estimand for treatment policy. |
| "Why Firth not standard logistic?" | Separation detected (or rare events <5%); standard ML diverges; Firth penalty + PLRT recommended (Heinze-Schemper 2002). |
| "Brant test result?" | PO holds for predictors X1, X2, X3; PO fails for X4 -> partial PO model fit per VGAM. |
| "Calibration?" | Calibration plot in supplement; H-L p reported as supplementary, not primary. |

## References

- Arel-Bundock V, Greifer N, Heiss A. 2024. How to interpret statistical models using marginaleffects in R and Python. *J Stat Softw* 111:9.
- Brant R. 1990. Assessing proportionality in the proportional odds model for ordinal logistic regression. *Biometrics* 46:1171-1178.
- FDA. 2023. Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products. Final Guidance, May 2023.
- Firth D. 1993. Bias reduction of maximum likelihood estimates. *Biometrika* 80:27-38.
- Hauck WW, Donner A. 1977. Wald's test as applied to hypotheses in logit analysis. *JASA* 72:851-853.
- Heinze G, Schemper M. 2002. A solution to the problem of separation in logistic regression. *Stat Med* 21:2409-2419.
- Kahan BC, Morris TP. 2012. Improper analysis of trials randomised using stratified blocks or minimisation. *Stat Med* 31:328-340.
- Lin DY, Wei LJ. 1989. The robust inference for the Cox proportional hazards model. *JASA* 84:1074-1078.
- Long JS, Ervin LH. 2000. Using heteroscedasticity consistent standard errors in the linear regression model. *Am Stat* 54:217-224.
- Moore KL, van der Laan MJ. 2009. Covariate adjustment in randomized trials with binary outcomes: TMLE. *Stat Med* 28:39-64.
- Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. 1996. A simulation study of the number of events per variable in logistic regression analysis. *J Clin Epidemiol* 49:1373-1379.
- Permutt T. 2020. Do covariates change the estimand? *Stat Biopharm Res* 12:45-53.
- Senn S. 2013. Seven myths of randomisation in clinical trials. *Stat Med* 32:1439-1450.
- Steyerberg EW. 2019. *Clinical Prediction Models* (2nd ed). Springer.
- Tsiatis AA, Davidian M, Zhang M, Lu X. 2008. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials. *Stat Med* 27:4658-4677.
- Wang B, Susukida R, Mojtabai R, Amin-Esmaeili M, Rosenblum M. 2021. Model-robust inference for clinical trials that improve precision by stratified randomization and covariate adjustment. *JASA* 116:1856-1870.
- Yee TW. 2022. On the Hauck-Donner effect in Wald tests. *JASA* 117:1763-1774.
- Zou G. 2004. A modified Poisson regression approach to prospective studies with binary data. *AJE* 159:702-706.

## Related Skills

- clinical-biostatistics/cdisc-data-handling - Prepare analysis datasets from CDISC SDTM/ADaM domains
- clinical-biostatistics/effect-measures - Modern CI methods; marginal vs conditional in depth
- clinical-biostatistics/categorical-tests - Chi-square and Fisher alternatives for unadjusted tests
- clinical-biostatistics/subgroup-analysis - Interaction terms and HTE detection
- clinical-biostatistics/survival-analysis - Cox regression for time-to-event analogues
- clinical-biostatistics/multiplicity-graphical - Bretz-Maurer graphs for multiple endpoints from one logistic model
- clinical-biostatistics/trial-reporting - CONSORT 2025 and ICH E9(R1) reporting of logistic analyses
