---
name: bio-clinical-biostatistics-adaptive-designs
description: Designs adaptive clinical trials including group-sequential (O'Brien-Fleming, Pocock, Lan-DeMets spending), sample-size re-estimation (blinded Friede-Kieser, unblinded Cui-Hung-Wang, Mehta-Pocock promising zone), seamless Phase 2/3 with treatment-arm selection, population enrichment, and response-adaptive randomisation. Covers FDA 2019 Final Adaptive Designs Guidance, FDA 2022 Master Protocols, and ICH E20 Step 2b/3 draft (June 2025, NOT final). Use when planning interim analyses, sample-size re-estimation, or master/platform-trial designs.
tool_type: r
primary_tool: rpact
goal_approach_exempt: true
---

## Version Compatibility

Reference examples tested with: R `rpact` 4.2+ (Wassmer/Brannath), `gsDesign` 3.6+ and `gsDesign2` 1.1+ (Anderson/Merck), `adaptr`, `simtrial`. Commercial: East/EastHorizon (Cytel), ADDPLAN (ICON), FACTS (Berry Consultants).

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name`
- Python adaptive packages are limited; R is the regulatory de facto standard

If code throws an error, introspect the installed package and adapt the example to match the actual API rather than retrying.

# Adaptive Clinical Trial Designs

**"Design an adaptive trial"** -> Pre-specify a design with one or more interim adaptations (early stopping, sample-size re-estimation, treatment selection, population enrichment, randomisation ratio changes) that strongly controls Type-I error at the trial-wide level via combination tests or the Conditional Rejection Probability principle.

## Regulatory Status -- The 2024-2026 Landscape

**FDA 2019 Final Adaptive Designs Guidance** (Federal Register 2019-25986, Dec 2 2019) finalised the 2010 and 2018 drafts. Recognises 5 design types: group-sequential, blinded SSR, unblinded SSR, adaptive enrichment, adaptive randomisation.

**FDA 2022 Final Master Protocols Guidance** (March 2022, NOT 2018 — common citation error): basket (one drug, many diseases), umbrella (multiple drugs, one disease), platform (perpetual, drugs enter/exit).

**ICH E20 Adaptive Clinical Trials**: **Step 2b draft June 25 2025; Step 3 public consultation (EU deadline Nov 30 2025; FDA Federal Register Sept 30 2025); Step 4 final targeted late 2026.** As of May 2026, ICH E20 is NOT final. The EFPIA/PhRMA position paper preceded the formal ICH work; Berry Consultants public comment letter is one of the more important submissions.

**FDA CDER Bayesian Methodology Draft (Jan 2026)** (FDA-2025-D-3217): first-ever drug-side Bayesian guidance; permits Bayesian primary inference in pivotals with simulation-based Type-I error calibration.

**Project Optimus (FDA OCE, 2021-2024)**: rewrites Phase I/II oncology by requiring randomised dose comparison before registration, replacing MTD-and-go. Made BOIN, mTPI-2, and multi-arm dose-finding the default.

## Algorithmic Taxonomy

| Design type | Adaptation | Type-I preservation | Software | Strength | Fails when |
|-------------|-----------|---------------------|----------|----------|------------|
| Group-sequential (O'Brien-Fleming) | Early stopping for efficacy/futility | Boundary calculation; very conservative early, near-nominal at end | rpact, gsDesign | FDA's preferred adaptive design | More complex SAP; IDMC firewall essential |
| Group-sequential (Pocock) | Early stopping | Constant nominal alpha at each look | rpact, gsDesign | Easy early stopping | Large penalty at final analysis |
| Wang-Tsiatis power family | Early stopping | Parameterised by Delta | rpact | Tunable conservatism | Δ choice matters |
| Lan-DeMets spending function | Early stopping (flexible timing) | Alpha-spending function | rpact, gsDesign | Operational flexibility; analyses don't need pre-specified number | FDA's de facto preferred framework |
| Blinded SSR (Friede-Kieser 2006) | Re-estimate variance/event-rate; recompute n | No Type-I inflation; agency-uncontroversial | rpact | EMA/FDA endorsed | Variance estimate must be blinded |
| Unblinded SSR (Cui-Hung-Wang 1999) | Increase n based on interim effect estimate | Requires CHW weights for control; or Mehta-Pocock promising zone | rpact | Recovers power if interim promising | IDMC firewall must be perfect; Jennison-Turnbull 2015 critique |
| Mehta-Pocock promising zone (2011) | Increase n if conditional power in (0.3, 0.8) | Calibrated so Type-I inflation negligible (~0.001) | rpact | Operational simplicity | "Stealth alpha inflation" critique (Jennison 2015) |
| Bauer-Köhne 1994 combination | Combine stagewise p-values via Fisher product | Any pre-specified design modification | rpact | Most flexible; theoretical foundation | Power loss vs designed group-sequential |
| Müller-Schäfer 2001 CRP principle | Preserve null conditional rejection probability | Any adaptation at any time | rpact | Modern theoretical bedrock | Implementation complexity |
| Adaptive enrichment | Drop sub-populations failing futility | Closed-test stage-wise | rpact, adaptr | Recovers power on responders | Selection bias on enriched population |
| Response-adaptive randomisation | Update allocation probabilities | Stratification + time-trend covariates required | adaptr, FACTS | Patient-welfare; learn-and-confirm | Drift bias, estimator bias; controversial (Hey-Kimmelman 2015 ethics) |
| Bayesian platform (I-SPY 2 style) | RAR + biomarker stratification + graduation criterion | Frequentist OCs via simulation | FACTS, custom Stan/JAGS | Modern oncology adaptive | Operational complexity; requires IDMC sophistication |

**Postdoc reading list:**

- Bauer P, Köhne K 1994 *Biometrics* 50:1029 (combination test; original adaptive)
- Cui L, Hung HMJ, Wang SJ 1999 *Biometrics* 55:853 (CHW weighted test for unblinded SSR)
- Müller HH, Schäfer H 2001 *Biometrics* 57:886 (CRP principle — theoretical bedrock)
- Mehta CR, Pocock SJ 2011 *Stat Med* 30:3267 (promising zone)
- Jennison C, Turnbull BW 2015 *Stat Med* 34(29):3793-3810 (Mehta-Pocock critique)
- Friede T, Kieser M 2006 *Biom J* 48:537 (blinded SSR)
- Lan KKG, DeMets DL 1983 *Biometrika* 70:659 (alpha spending function)
- O'Brien PC, Fleming TR 1979 *Biometrics* 35:549 (OBF boundary)
- Pocock SJ 1977 *Biometrika* 64:191 (Pocock boundary)
- Hey SP, Kimmelman J 2015 *Clin Trials* 12:102 (RAR ethics critique)
- Berry DA 2015 commentary *Clin Trials* 12:107 (counter)
- Robertson DS, Lee KM, López-Kolkovska BC, Villar SS 2023 *Stat Sci* 38:185 (canonical modern RAR review)
- Wassmer G, Brannath W 2016 *Group Sequential and Confirmatory Adaptive Designs in Clinical Trials* (Springer)
- Jennison C, Turnbull BW 2000 *Group Sequential Methods with Applications to Clinical Trials* (CRC)

## Decision Tree by Scenario

| Scenario | Recommended approach | Why |
|----------|---------------------|-----|
| Confirmatory trial wanting interim early stopping | Group-sequential with O'Brien-Fleming boundaries via gsDesign | FDA-preferred; near-nominal final alpha |
| Group-sequential with flexible look timing | Lan-DeMets spending function | Operational flexibility; FDA de facto preferred |
| Phase 3 with uncertain nuisance parameter (variance, event rate) | Blinded SSR (Friede-Kieser) | No Type-I inflation; agency-uncontroversial |
| Phase 3 wanting to increase n if interim shows promise | Mehta-Pocock promising zone with CHW weights | Recovers power; calibrated Type-I |
| Seamless Phase 2/3 with arm selection | Bauer-Köhne combination test + closed testing | Most flexible; cite Müller-Schäfer CRP |
| Adaptive enrichment (drop subpopulation) | Adaptive enrichment with closed-test stage-wise | Recovers power on responders |
| Multi-arm oncology platform | Bayesian platform with RAR (I-SPY 2 model) | Patient-welfare argument strong for multi-arm |
| 2-arm phase 3 oncology with potential RAR | Avoid RAR; group-sequential preferred | Hey-Kimmelman 2015 ethics + drift bias |
| Continuous endpoint, treatment discontinuation, follow-up data available | Hybrid: J2R imputation for treatment-discontinuation ICEs, MMRM-MAR for other missingness | Aprocitentan PRECISION precedent (2024); FDA de facto standard 2024-2025 for treatment-policy estimands |
| Phase 1 dose-finding | BOIN (FDA Fit-for-Purpose qualified 2021) | Transparent, tabulated decisions; no bedside Bayesian software |
| Phase 1b/2 dose-optimisation (Project Optimus) | Multi-arm BOIN-12 or multi-dose randomised | FDA Aug 2024 final dose-optimisation guidance |
| Basket trial (one drug, multiple diseases) | EXNEX or robust MAP via RBesT | Borrows across baskets while permitting one to detach |
| Umbrella trial (one disease, multiple drugs) | Bayesian platform with shared control | FDA Master Protocols 2022 |
| Pediatric extrapolation borrowing from adults | Power prior with discount γ in 0.3-0.6 | FDA Bayesian Jan 2026 draft endorses |

## Group-Sequential Designs

### O'Brien-Fleming -- the regulatory default

```r
library(gsDesign)

# OBF boundaries; 3 interim looks at 33%, 67%, 100% information
design <- gsDesign(
    k = 4,                  # total analyses including final
    test.type = 1,          # 1-sided efficacy
    alpha = 0.025,
    beta = 0.10,            # power = 0.90
    sfu = sfLDOF,           # Lan-DeMets approximation of OBF
    timing = c(0.25, 0.50, 0.75, 1.0)
)
print(design)
plot(design)
```

OBF is **very conservative at early looks** (nominal alpha approximately 0.0001 at 25% info) and **near-nominal at final analysis** (~0.024 of 0.025). Preferred by FDA because the final-analysis penalty is small.

### Pocock -- constant nominal

Constant nominal alpha at each look. Easy early stopping but large final-analysis penalty (~0.018 of 0.025 with k=4). Rarely used in confirmatory.

### Lan-DeMets spending function -- the modern flexibility

```r
# Lan-DeMets OBF-like spending function (sfLDOF)
# Allows analysis timing to differ from pre-specified
design_flex <- gsDesign(
    k = 3,
    sfu = sfLDOF,      # OBF-like spending
    alpha = 0.025,
    beta = 0.10
)

# Actual analyses can occur at different information fractions
# Spending function returns alpha to spend at each look based on actual timing
```

**The flexibility:** sponsor can perform analyses at different information fractions than originally planned. FDA's de facto preferred framework.

### Sample-size for group-sequential

```r
# Time-to-event group-sequential
library(gsDesign)
n_gs <- gsSurv(
    k = 3,
    test.type = 2,    # 2-sided
    alpha = 0.025,
    beta = 0.10,
    sfu = sfLDOF,
    lambdaC = 0.04,   # control hazard per month
    hr = 0.70,        # treatment HR
    eta = 0.005,      # dropout hazard
    T = 24,           # total study duration
    minfup = 12       # minimum follow-up
)
print(n_gs)
```

## Sample-Size Re-Estimation

### Blinded SSR (Friede-Kieser 2006)

Re-estimate nuisance parameter (variance σ² for continuous, control event rate p_0 for binary, overall event rate for survival) from blinded interim data. **No Type-I error inflation when test statistic ignores the SSR.**

```r
library(rpact)
# Blinded SSR for continuous outcome
design_blinded_ssr <- getDesignGroupSequential(
    kMax = 2,
    alpha = 0.025,
    beta = 0.20,
    sided = 1,
    informationRates = c(0.5, 1)
)

# At interim, re-estimate variance and recompute n
# (manual implementation; rpact has built-in support via getDesignInverseNormal for unblinded)
```

EMA Reflection Paper 2007 and FDA 2019 explicitly endorse blinded SSR. **Uncontroversial.**

### Unblinded SSR (Cui-Hung-Wang 1999)

Interim effect estimate triggers sample-size change. **Type-I inflation if naive:** Cui-Hung-Wang showed 8% Type-I vs 2.5% target.

The Cui-Hung-Wang weighted test uses pre-specified weights from the original design:

```
Z_weighted = w_1 * Z_1 + w_2 * Z_2_residual
```

where w_1, w_2 are the pre-specified weights (based on original n_1, n_2) and Z_2_residual is the test statistic on the data after the interim. **Pre-specified weights preserve alpha** even if the actual n at stage 2 differs.

```r
library(rpact)
design_unblinded_ssr <- getDesignInverseNormal(
    kMax = 2,
    alpha = 0.025,
    beta = 0.20,
    sided = 1,
    informationRates = c(0.5, 1),
    typeOfDesign = 'WT',  # Wang-Tsiatis power family
    deltaWT = 0.25
)

# Use inverse normal combination for adaptive SSR
analysis_result <- getAnalysisResults(
    design_unblinded_ssr,
    dataInput = getDataMeans(...)
)
```

### Mehta-Pocock Promising Zone (2011)

At interim, compute conditional power (CP) given observed effect:

- **Unfavourable zone** (CP < ~30%): stop or continue without modification
- **Promising zone** (CP in 30-80%): increase n to recover power; NO Type-I penalty if increase rule pre-specified and uses original test statistic with original weights
- **Favourable zone** (CP > 80%): continue without change

```r
# rpact implementation
# Sample size recalculation in promising zone
n_increased <- getSampleSizeMeans(
    design_unblinded_ssr,
    alternative = 5,        # detect mean diff of 5
    stDev = 12,
    groups = 2
)
```

**The mathematical sleight:** promising zone is constructed so unconditional Type-I error inflation is negligible (~0.001) even WITHOUT CHW weighting. **Jennison-Turnbull 2015 critique:** stealth alpha inflation in unpublished simulation assumptions; inefficient relative to CHW-weighted GSD. Mehta defends on operational grounds.

**Hsiao et al 2020 *Trials* 21:1003** is the systematic review.

## Combination Tests and CRP Principle

**Bauer-Köhne 1994** *Biometrics* 50:1029: combine stagewise p-values via Fisher's product test. Permits design modifications post-interim while controlling Type-I error.

**Müller-Schäfer 2001** *Biometrics* 57:886: **Conditional Rejection Probability (CRP) principle** — preserve the null conditional rejection probability at every adaptation, and unconditional Type-I is preserved. The theoretical bedrock of all post-2001 confirmatory adaptive designs.

Müller-Schäfer 2004 *Stat Med* 23:2497 extended to ANY design change at ANY time.

```r
# rpact natively supports combination tests
design_comb <- getDesignFisher(
    kMax = 3,
    alpha = 0.025,
    sided = 1
)

# Or inverse normal combination
design_inv_norm <- getDesignInverseNormal(
    kMax = 3,
    alpha = 0.025,
    informationRates = c(0.33, 0.67, 1.0)
)
```

## Adaptive Enrichment

Drop sub-populations failing futility; re-power on responders. **Closed-test stage-wise** to control familywise error across full and enriched populations.

```r
# rpact: enrichment design via getDesignEnrichmentSubgroup
# Standard implementation requires explicit definition of full population (F)
# and enriched population (S)
```

**Postdoc concern:** selection bias on the enriched population — the observed treatment effect on the enriched subgroup is biased upward by selection. Bias-correction via simulation or hierarchical Bayesian.

## Response-Adaptive Randomisation -- The Ethics Fight

**Hey & Kimmelman 2015 *Clin Trials* 12:102 "Are outcome-adaptive allocation trials ethical?"** Argued RAR's purported ethical advantage (equipoise, sub-optimal exposure minimisation) **fails in two-arm and early-phase settings** because:

1. Drift bias inflates Type-I error / biases estimates (time trends confounded with allocation)
2. Consent dynamics confused — patients believe allocation is "personalised" when stochastic
3. Marginal patient-welfare benefit is statistical and small while operational risks real

**Counter-arguments:**

- **Berry DA 2015 commentary** *Clin Trials* 12:107: RAR enables learn-and-confirm, multi-arm platforms (I-SPY 2 model) where the patient-welfare argument IS the point and equal allocation would be unethical given accumulating evidence.
- **Saville & Berry 2016 *Clin Trials* 13:358:** RAR's operating characteristics are competitive in multi-arm; bias and inflation problems disappear with stratification, time-trend covariates, and proper analysis weights.
- **Buyse 2015 *Clin Trials* 12:108:** Hey-Kimmelman correct for 2-arm but wrong for multi-arm.

**Consensus position (2020s; ICH E20):** RAR appropriate when (a) multi-arm (>=3 arms), (b) rare disease / limited pool, (c) strong PoC of differential biomarker response, (d) robust drift-bias adjustment and pre-specified analysis weights. **Inappropriate for confirmatory 2-arm trials.**

**Robertson, Lee, López-Kolkovska, Villar 2023 *Stat Sci* 38:185 ("Response-adaptive randomization: from myths to practical considerations")** is the canonical modern review settling the debate.

## Bayesian Platform Trials

**I-SPY 2 (Barker-Sigman 2009 *Clin Pharmacol Ther*; Park-Liu 2016 *NEJM* 375:11):** neoadjuvant breast cancer; 10 biomarker-defined subtypes × multiple arms; Bayesian RAR; graduation criterion = posterior predictive probability of success in 300-patient Phase 3 ≥ 85%. Berry Consultants designed the engine. Multiple drugs graduated (neratinib, veliparib, pembrolizumab).

**GBM AGILE (Alexander 2018; published readouts beginning 2024):** glioblastoma; response-adaptive Bayesian; first global registrational platform in neuro-oncology. Regorafenib readout 2025 *JCO* JCO-25-01137.

**REMAP-CAP (Angus 2020 *JAMA*):** severe pneumonia, repurposed for COVID-19 in 2020; **Bayesian factorial multi-domain design** — multiple intervention domains tested simultaneously and combinatorially. Generated corticosteroid signal in COVID independently of RECOVERY.

### Drop-the-loser vs promising-the-winner

- **Adaptive arm-dropping (futility):** Bayesian posterior probability of beating control drops below threshold -> arm closes. Mathematically straightforward; FDA-acceptable.
- **"Promising-the-winner" (graduate to Phase 3):** introduces selection bias. Bias-adjusted estimators (Robertson 2023; conditional MLE) now standard in I-SPY 2 reports.

## Phase I Dose-Finding -- BOIN, mTPI, CRM

| Design | Citation | Idea | Where it wins |
|--------|----------|------|---------------|
| CRM | O'Quigley-Pepe-Fisher 1990 | Single-parameter logistic/power model; updates posterior MTD probability after each cohort | Statistically efficient; skeleton calibration needed |
| EWOC | Babb-Rogatko-Zacks 1998 | CRM-like with explicit overdose-control constraint (P(dose > MTD) <= 0.25) | Safer than CRM in small trials |
| mTPI | Ji et al 2010 *Clin Trials* 7:653 | Beta-binomial; UPM decision rule on under/proper/over-dosing intervals | Pre-tabulated decisions; documented over-shoot bias |
| mTPI-2 / Keyboard | Guo-Wang-Chen-Ji 2017 | Fixes mTPI Ockham bias by equal-width intervals | Default mTPI replacement |
| BOIN | Liu-Yuan 2015 *J R Stat Soc C* 64:507 | Pre-tabulated escalation interval bounds optimised to minimise incorrect-decision probability | **FDA Fit-for-Purpose qualified Dec 2021**; near-CRM with no bedside software |

**Why FDA prefers BOIN operationally:** qualified as Fit-for-Purpose under FDA Drug Development Tools program (review document FDA-2020-X-XXXX, posted 2021). Investigator uses pre-printed escalation table — no real-time Bayesian software at the bedside.

R packages: `BOIN`, `dfcrm` (Cheung — author of CRM textbook), `trialr` (Brock — includes EffTox), `escalation` (Brock — unified framework).

## Reconciliation: When Methods Disagree

| Pattern | Likely cause | Action |
|---------|--------------|--------|
| Blinded SSR n vs unblinded SSR n differ substantially | Unblinded SSR uses interim effect estimate; blinded uses nuisance parameter only | Blinded is Type-I-clean; unblinded requires CHW weighting; pre-specify the approach in SAP |
| Group-sequential rejects at interim; Cui-Hung-Wang weighted final test does not | Naive interim rejection used original test statistic; CHW weights downweight late data | Pre-specify boundary and weights; do NOT switch tests mid-stream |
| Mehta-Pocock promising-zone vs CHW-weighted GSD give different n increases | Promising zone calibrated for Type-I (~0.001 inflation); CHW more efficient under known effect | Jennison-Turnbull 2015 critique: promising zone "stealth alpha"; pre-specify with simulation OCs |
| Adaptive enrichment selects subgroup at interim; replication shows smaller effect | Selection bias on enriched population (Sun 2010 winner's curse) | Bias-correction via conditional MLE or hierarchical Bayesian; cite Robertson 2023 |
| RAR posterior allocation favours active in 2-arm trial; randomisation drift bias suspected | Time trends confounded with allocation changes | Pre-specify time-trend covariates in analysis; use proper analysis weights; cite Robertson 2023 RAR consensus (RAR INAPPROPRIATE for 2-arm confirmatory) |
| BOIN vs CRM choose different MTD on same data | CRM uses model; BOIN uses tabulated boundaries; differ when skeleton mis-calibrated | BOIN Fit-for-Purpose qualified (Dec 2021); CRM more efficient under correct skeleton; report OCs over both |
| I-SPY 2 graduation criterion met but Phase 3 replication fails | Selection bias on graduated arm; PP threshold not bias-corrected | Apply conditional MLE; cite Robertson 2023; report both raw and bias-corrected estimates |
| Müller-Schäfer CRP preserved but ad hoc rule appears Type-I-inflated in simulation | Implementation deviation from formal CRP | Verify CRP equation precisely; report OCs via simulation; cite Müller-Schäfer 2001 |

## Per-Method Failure Modes

### Unblinded SSR with naive sample increase

- **Trigger:** Sponsor increases n based on interim effect without CHW weighting.
- **Mechanism:** Type-I inflation up to 8% (Cui-Hung-Wang 1999 simulation).
- **Symptom:** Independent reanalysis finds Type-I > 5%.
- **Fix:** Pre-specify CHW weights from original design; use combination test in rpact.

### Mehta-Pocock promising-zone "stealth alpha"

- **Trigger:** Promising-zone applied without sufficient simulation.
- **Mechanism:** Jennison-Turnbull 2015 critique — Type-I inflation hidden in unpublished simulation assumptions.
- **Symptom:** Independent reanalysis finds Type-I 5.3% vs nominal 5%.
- **Fix:** Pre-specify increase rule transparently; report simulation OCs.

### RAR drift bias

- **Trigger:** RAR in trial with time trends (calendar effects, learning curves).
- **Mechanism:** Time trends confounded with allocation changes; biased effect estimate.
- **Symptom:** Effect estimate sensitive to time-trend adjustment.
- **Fix:** Pre-specify time-trend covariates in analysis; use proper analysis weights; cite Robertson 2023.

### Schoenfeld formula under immunotherapy delayed effect

- **Trigger:** Sample size calculated via Schoenfeld 1981 assuming PH.
- **Mechanism:** Delayed effect violates PH; events under-estimated by 20-50%.
- **Symptom:** Trial under-powered; observed events insufficient.
- **Fix:** Lakatos 1988 or simulation under expected HR(t); cite Lin 2020 NPH Working Group.

### IDMC firewall failure in unblinded SSR (IDMC = Independent Data Monitoring Committee; the regulatory-standard term)

- **Trigger:** Interim effect estimate leaks beyond IDMC.
- **Mechanism:** Sponsor inference from increase decision reveals direction of interim effect.
- **Symptom:** Regulator audit reveals unblinding.
- **Fix:** Strict firewall SOP; only "increase / no increase" communicated to sponsor; cite ICH E20.

### Adaptive enrichment selection bias

- **Trigger:** Enriched population effect reported without bias correction.
- **Mechanism:** Selection on subgroup with promising interim effect inflates estimate.
- **Symptom:** Independent replication on enriched subgroup gives smaller effect.
- **Fix:** Bias-correction via simulation or hierarchical Bayesian; cite Sun 2010 winner's curse.

### RAR in 2-arm confirmatory

- **Trigger:** RAR applied to confirmatory 2-arm trial.
- **Mechanism:** Hey-Kimmelman 2015 critique — drift bias, consent confusion, marginal benefit.
- **Symptom:** Reviewer rejects RAR as inappropriate for setting.
- **Fix:** Group-sequential with futility/efficacy boundaries instead; cite Robertson 2023 consensus.

### CRM skeleton mis-calibration

- **Trigger:** CRM applied with default skeleton without simulation.
- **Mechanism:** Skeleton dictates target dose; mis-calibration biases MTD.
- **Symptom:** MTD selection differs systematically from clinical expectation.
- **Fix:** Calibrate skeleton via Lee-Cheung 2009 indifference-interval method; or switch to BOIN.

## Quantitative Thresholds

| Threshold | Source | Rationale |
|-----------|--------|-----------|
| FDA Fit-for-Purpose BOIN qualification (Dec 2021) | FDA Drug Development Tools program | First dose-finding design with formal FDA endorsement |
| Mehta-Pocock promising zone CP 30-80% | Mehta-Pocock 2011 | Mathematical calibration for Type-I preservation |
| RAR appropriate >= 3 arms | Robertson 2023 consensus | Multi-arm patient-welfare argument |
| OBF nominal alpha ~0.024 at final / 0.025 | gsDesign | Small final penalty preferred by FDA |
| Schoenfeld under non-PH under-estimates 20-50% | Lin 2020 NPH WG | Use Lakatos or simulation |
| I-SPY 2 graduation: PP success in Phase 3 >= 85% | Barker 2009 | Bayesian platform standard |
| Power prior discount γ 0.3-0.6 for pediatric extrapolation | FDA Bayesian Jan 2026 draft | Partial borrowing default |

## Common Errors

| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| Unblinded SSR with naive sample increase | No CHW weighting | Pre-specify CHW weights; cite Cui-Hung-Wang 1999 |
| Mehta-Pocock without simulation OCs | Stealth alpha inflation | Report simulation OCs; cite Jennison 2015 |
| RAR in 2-arm confirmatory | Misapplication | Group-sequential instead; cite Robertson 2023 |
| Schoenfeld for immuno-oncology | PH assumption violated | Lakatos or simulation; cite Lin 2020 |
| Adaptive enrichment effect reported uncorrected | Selection bias | Bias-correction; cite Sun 2010 |
| CRM with default skeleton | Mis-calibration | Calibrate via Lee-Cheung 2009 or switch to BOIN |
| ICH E20 cited as "finalised April 2024" | Confusion with EFPIA position paper | ICH E20 is Step 2b/3 draft (June 2025); not final |
| FDA Master Protocols "2018" | 2018 was draft | March 2022 was the final |
| Bauer-Köhne combination test as "old-fashioned" | Misunderstanding | Foundational; cited in modern combination-test implementations |
| Stop-for-efficacy at first interim with OBF | OBF nominal alpha ~0.0001 at 25% info | Trial must show very strong evidence to stop early; expected |

## Anticipated Reviewer Pushback

| Pushback | Response |
|----------|----------|
| "How is Type-I error controlled?" | Closed testing via Müller-Schäfer CRP principle; specific implementation is inverse normal combination test in rpact |
| "Why these boundaries?" | OBF via Lan-DeMets sfLDOF spending function; preserves final-analysis power; pre-specified in SAP |
| "Pre-specification of SSR rule?" | Promising zone CP in (0.3, 0.8) triggers increase to n_max via CHW-weighted statistic; pre-specified n_max in SAP |
| "IDMC firewall?" | IDMC receives interim effect estimate; sponsor receives only "increase / no increase" decision; SOP documented; pre-specified |
| "RAR ethics?" | Multi-arm (4 arms) setting; Berry 2015 consensus that patient-welfare argument valid; drift-bias adjustment in primary analysis |
| "Promising zone vs CHW-weighted GSD?" | Operational simplicity preferred; OCs from simulation confirm Type-I ~5%; supportive Cui-Hung-Wang analysis |
| "Adaptive enrichment bias?" | Bias-correction via simulation; conditional MLE for enriched-population effect; cite Sun 2010 |
| "Phase 1 BOIN vs CRM?" | BOIN Fit-for-Purpose qualified by FDA Dec 2021; tabulated decisions; no bedside Bayesian software |

## References

- Babb J, Rogatko A, Zacks S. 1998. Cancer Phase I clinical trials: efficient dose escalation with overdose control. *Stat Med* 17:1103-1120.
- Bauer P, Köhne K. 1994. Evaluation of experiments with adaptive interim analyses. *Biometrics* 50:1029-1041.
- Berry DA. 2015. Commentary on Hey & Kimmelman. *Clin Trials* 12:107-110.
- Cui L, Hung HMJ, Wang SJ. 1999. Modification of sample size in group sequential clinical trials. *Biometrics* 55:853-857.
- FDA. 2019. Adaptive Designs for Clinical Trials of Drugs and Biologics. Final Guidance.
- FDA. 2022. Master Protocols: Efficient Clinical Trial Design Strategies to Expedite Development of Oncology Drugs and Biologics. Final Guidance, March 2022.
- FDA. 2026. Use of Bayesian Methodology in Clinical Trials. Draft Guidance, January 2026.
- Friede T, Kieser M. 2006. Sample size recalculation in internal pilot study designs. *Biom J* 48:537-555.
- Hey SP, Kimmelman J. 2015. Are outcome-adaptive allocation trials ethical? *Clin Trials* 12:102-106.
- Jennison C, Turnbull BW. 2015. Adaptive sample size modification in clinical trials: start small then ask for more? *Stat Med* 34(29):3793-3810.
- Lan KKG, DeMets DL. 1983. Discrete sequential boundaries for clinical trials. *Biometrika* 70:659-663.
- Liu S, Yuan Y. 2015. Bayesian optimal interval designs for phase I clinical trials. *JRSS-C* 64:507-523.
- Mehta CR, Pocock SJ. 2011. Adaptive increase in sample size when interim results are promising. *Stat Med* 30:3267-3284.
- Müller HH, Schäfer H. 2001. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. *Biometrics* 57:886-891.
- O'Brien PC, Fleming TR. 1979. A multiple testing procedure for clinical trials. *Biometrics* 35:549-556.
- O'Quigley J, Pepe M, Fisher L. 1990. Continual reassessment method: a practical design for phase 1 clinical trials in cancer. *Biometrics* 46:33-48.
- Pocock SJ. 1977. Group sequential methods in the design and analysis of clinical trials. *Biometrika* 64:191-199.
- Robertson DS, Lee KM, López-Kolkovska BC, Villar SS. 2023. Response-adaptive randomization in clinical trials: from myths to practical considerations. *Stat Sci* 38:185-208.
- Wassmer G, Brannath W. 2016. *Group Sequential and Confirmatory Adaptive Designs in Clinical Trials*. Springer.

## Related Skills

- clinical-biostatistics/power-and-sample-size - Sample size for adaptive designs
- clinical-biostatistics/multiplicity-graphical - Closed testing in adaptive contexts
- clinical-biostatistics/bayesian-trials - Bayesian platform trials, BOIN/CRM/EWOC
- clinical-biostatistics/trial-reporting - Reporting adaptive trial results per CONSORT 2025
- clinical-biostatistics/survival-analysis - Adaptive designs for TTE endpoints
- experimental-design/sample-size - General sample-size methods
