--- name: bio-clip-seq-differential-clip description: Identify differentially bound regions across CLIP-seq conditions (knockdown vs control, treatment vs vehicle, disease vs healthy) using DEWSeq (sliding-window DESeq2), Flipper (Skipper-downstream), ASpeak, edgeR, or limma-voom. Use when computing condition-level changes in RBP binding intensity, choosing peak-level vs window-level vs crosslink-level testing, designing replicate experiments, or distinguishing biological binding shifts from technical confounders. tool_type: r primary_tool: DEWSeq --- ## Version Compatibility Reference examples tested with: DEWSeq 1.18+, htseq-clip 2.0+, DESeq2 1.44+, edgeR 4.2+, limma 3.60+, Flipper (commit 2024.04+), Skipper (commit 2023.05+), pybedtools 0.10+, pyranges 0.0.129+. Before using code patterns, verify installed versions match. If versions differ: - R: `packageVersion('')` then `?function_name` to verify parameters - Python: `pip show ` then `help(module.function)` to check signatures - CLI: ` --version` then ` --help` to confirm flags If code throws unexpected errors, introspect the installed package and adapt the example to match the actual API rather than retrying. # Differential CLIP-seq Analysis **"Identify regions with changed RBP binding across conditions"** -> Test for condition-level differences in IP enrichment relative to SMInput, accounting for replicate variance and (where available) sequencing depth normalization. Three statistical scales are possible: peak-level (test each peak as a unit), window-level (test fixed transcriptome windows; DEWSeq, Flipper), or crosslink-site level (test single-nt positions). The choice depends on the biology (narrow regulatory shift vs broad binding-mode change) and on which upstream peak caller was used (CLIPper -> peak-level; Skipper -> window-level; PureCLIP -> CL-level). - R (window-level, DEWSeq + htseq-clip): `library(DEWSeq); dds <- DESeqDataSetFromSlidingWindows(counts, colData, design=~condition); dds <- DESeq(dds); res <- results(dds)` - R (Skipper downstream, Flipper): `flipper differential -i skipper_out/ --design design.tsv --contrast treatment vs control -o flipper_out/` - R (peak-level, edgeR): `dge <- DGEList(counts=peak_counts, group=condition); dge <- calcNormFactors(dge); design <- model.matrix(~condition); dge <- estimateDisp(dge, design); fit <- glmQLFit(dge, design); res <- glmQLFTest(fit)` - R (peak-level, limma-voom): `v <- voom(dge, design); fit <- lmFit(v, design); fit <- eBayes(fit); res <- topTable(fit, coef=2, number=Inf)` - CLI (htseq-clip preprocessing): `htseq-clip extract -i annotation.gff -o annotation_windows.bed -w 50 -s 20 && htseq-clip count -i sample.bam -w annotation_windows.bed -o sample.counts.txt` DEWSeq is the EMBL/Hentze-group windowed-NB approach designed specifically for differential CLIP. Flipper is the Skipper-companion tool (Tu 2024) for the modern Skipper workflow. Peak-level edgeR/limma-voom work when the upstream peak caller produced a comparable peak BED across conditions (e.g., consensus peaks from CLIPper). ## Algorithmic Taxonomy | Tool | Scale | Statistical model | Replicate requirement | Strength | Fails when | |------|-------|-------------------|----------------------|----------|------------| | DEWSeq (Schwarzl 2020) | 50-100 nt sliding window | Negative binomial GLM (DESeq2 internals) | >= 2 reps per condition | Designed specifically for CLIP; integrates SMInput | Slow on dense libraries; output window-resolution | | Flipper (Tu 2024) | Skipper window (100 nt) | Negative binomial; designed for Skipper output | >= 2 reps per condition | Modern; pairs with Skipper peak caller | Only useful if upstream is Skipper | | ASpeak | Peak | Negative binomial | >= 2 reps | Peak-level for CLIPper output | Less popular; legacy | | edgeR (general) | Peak or window | Quasi-likelihood F-test on NB | >= 2 reps | Mature, widely cited | Generic; not CLIP-aware; needs careful normalization | | limma-voom | Peak or window | Linear model with mean-variance trend | >= 2 reps | Fast; well-validated; handles small samples | Generic; treats counts as continuous after voom | | DESeq2 (direct) | Peak or window | Negative binomial GLM | >= 2 reps | Mature; same engine as DEWSeq | Same as edgeR caveats | | MASTR-seq (Banks 2023) | Cell-resolved (scCLIP) | Cell-mixture model | Cells | Single-cell CLIP differential | Specialized; few published applications | | diffbind (CLIP adaptation) | Peak | DESeq2 or edgeR backend | >= 2 reps | Familiar from ChIP/ATAC | Designed for ChIP; needs CLIP-specific normalization | | MAnorm2 | Peak | Hierarchical empirical Bayes | >= 2 reps | Tested on ChIP; less on CLIP | Less CLIP-specific | Methodology evolves; verify DEWSeq vignette and Flipper paper for current best practice. The DEWSeq + htseq-clip pipeline (Schwarzl 2020) is the most-published CLIP-specific differential framework; Flipper (Tu 2024) is the modern Skipper-coupled alternative. ## Critical Decision: The Interaction-Term Design **Use `~ type + condition + type:condition` and test the interaction-term coefficient.** This is the single most consequential statistical choice in differential CLIP. - `type` = `ip` vs `sminput` (whether the library is IP or size-matched input) - `condition` = `treated` vs `untreated` (or KD vs control) - `type:condition` interaction = "Does the IP-vs-input ratio shift across conditions?" A naive `~ condition` design tests whether read counts differ regardless of whether they come from IP or SMInput - this confounds binding changes with expression changes. The interaction term explicitly tests for **differential binding** (the biologically meaningful signal) rather than differential expression at peak loci. When testing, extract the interaction-term coefficient: `results(dds, name = 'typeip.conditiontreat')`. The log2FoldChange returned is the change in IP/input ratio in `treated` vs `untreated` - this is what "differential binding" means. ## Critical Choice: Peak-Level vs Window-Level vs Crosslink-Level Three scales differ in resolution and statistical power: **Peak-level (edgeR, limma-voom, DESeq2 on CLIPper peaks):** Test each consensus peak's IP/SMInput log2 FC across conditions. Pro: peak boundaries are biologically meaningful; output interpretable. Con: peak set changes across conditions (a new peak in treatment but missing in control complicates testing); SMInput normalization must be applied consistently. **Window-level (DEWSeq, Flipper):** Tile transcriptome into fixed 50-100 nt windows; test each window. Pro: comparable across conditions (windows are pre-defined); handles binding-mode shifts within a peak; high statistical power. Con: window-resolution; multiple-testing burden (millions of windows); biological meaning of a window needs translation. **Crosslink-level (custom):** Test each single-nt CL position. Pro: nucleotide resolution; captures motif-level shifts. Con: very low coverage per position; massive multiple-testing burden; rarely used in published differential CLIP. | Goal | Scale | Tool | |------|-------|------| | ENCODE-style peak-level differential (CLIPper upstream) | Peak | DESeq2 / edgeR / limma-voom on CLIPper consensus peaks | | Maximum sensitivity windowed differential | Window | DEWSeq (with htseq-clip) or Flipper (with Skipper) | | Modern Skipper-coupled workflow | Window | Flipper | | Single-cell scCLIP differential | Cell | MASTR-seq or scCLAM (sparse) | | Compare binding-mode shifts within a peak | Window | DEWSeq | | Allele-specific differential | CL site | BEAPR + custom logistic | | RBP-KD effect on binding profile | Peak/window | DEWSeq (handles KD-effect on RBP itself) | ## DEWSeq Workflow (Window-Level Differential) DEWSeq is the EMBL/Hentze-group differential framework specifically designed for CLIP. The pipeline is: **Goal:** Identify transcriptome windows where the IP-vs-SMInput ratio shifts across conditions, accounting for replicate variance with the negative-binomial GLM. **Approach:** Use htseq-clip to extract sliding 50 nt windows across annotated features, count reads per window per sample, build a DESeqDataSetFromSlidingWindows object with the `~ type + condition + type:condition` interaction design, extract the `typeip.conditiontreat` interaction coefficient as the differential-binding effect size, and aggregate adjacent significant windows with `bedtools merge -d 100`. ```bash # Step 1: htseq-clip generates sliding-window count matrices htseq-clip extract \ -i gencode.v38.annotation.gff \ -o annotation_windows.bed \ --window-size 50 \ --window-step 20 \ --feature-type CDS,UTR # Step 2: count IP and SMInput reads per window per sample for sample in ip_rep1 ip_rep2 sminput_rep1 sminput_rep2; do htseq-clip count \ -i ${sample}.dedup.bam \ -a annotation_windows.bed \ -o ${sample}.counts.txt \ --mate 2 done # (For eCLIP, --mate 2 because R2 5' is the truncation site; for iCLIP single-end use --mate 1) # Step 3: DEWSeq differential testing htseq-clip mergeCounts \ -i ip_rep1.counts.txt ip_rep2.counts.txt sminput_rep1.counts.txt sminput_rep2.counts.txt \ -o merged_counts.tsv ``` ```r library(DEWSeq) counts <- read.table('merged_counts.tsv', sep='\t', header=TRUE, row.names=1) colData <- data.frame( sample = c('ip_rep1','ip_rep2','sminput_rep1','sminput_rep2'), type = c('ip','ip','sminput','sminput'), condition = c('treated','treated','untreated','untreated') ) dds <- DESeqDataSetFromSlidingWindows( countData = counts, colData = colData, annotObj = 'annotation_windows.bed', design = ~ type + condition ) dds <- DESeq(dds) # In a model with `~ type + condition`, the IP-vs-input contrast is the simple condition # main effect; to test the differential CLIP signal between conditions use the interaction # term name from the design matrix (matches the skill's interaction-term guidance below): res <- results(dds, name='typeip.conditiontreated') # Window-level FDR adjustment res_filtered <- res[!is.na(res$padj) & res$padj < 0.05 & abs(res$log2FoldChange) > 1, ] # Aggregate adjacent significant windows into differential regions sig_windows <- as.data.frame(res_filtered) sig_windows$chr <- gsub('_.*', '', rownames(sig_windows)) # Custom reduce: combine adjacent windows within 100 nt ``` ## Flipper Workflow (Skipper-Coupled) Flipper (Tu 2024) is the differential companion to Skipper, using the same 100 nt feature-respecting windows and the same beta-binomial framework. ```bash # Assume Skipper has been run on all samples; Skipper output is at skipper_out/ flipper differential \ -i skipper_out/ \ --design design.tsv \ --contrast treatment vs control \ -o flipper_out/ # design.tsv format: # sample_id condition replicate ip_or_input # ip_treat_r1 treatment 1 ip # ip_treat_r2 treatment 2 ip # in_treat_r1 treatment 1 input # ... ``` Output: differential window BED with log2 FC, p, padj per window. ## Peak-Level Differential (CLIPper Upstream) ```r library(DESeq2) library(GenomicRanges) library(Rsubread) # Step 1: union of CLIPper peaks across conditions # (See bedtools merge upstream) peaks <- read.table('consensus_peaks.bed', sep='\t', col.names=c('chr','start','end','name','score','strand')) # Step 2: count reads per peak per sample with featureCounts saf <- data.frame( GeneID = peaks$name, Chr = peaks$chr, Start = peaks$start + 1, # 1-based for featureCounts End = peaks$end, Strand = peaks$strand ) counts_ip <- featureCounts( files = c('ip_treat_r1.bam','ip_treat_r2.bam','ip_ctrl_r1.bam','ip_ctrl_r2.bam'), annot.ext = saf, isGTFAnnotationFile = FALSE, strandSpecific = 1, isPairedEnd = TRUE )$counts counts_in <- featureCounts( files = c('in_treat_r1.bam','in_treat_r2.bam','in_ctrl_r1.bam','in_ctrl_r2.bam'), annot.ext = saf, isGTFAnnotationFile = FALSE, strandSpecific = 1, isPairedEnd = TRUE )$counts # Step 3: DESeq2 with IP vs SMInput interaction all_counts <- cbind(counts_ip, counts_in) colData <- data.frame( type = rep(c('ip','input'), each=4), condition = rep(c('treat','treat','ctrl','ctrl'), 2), replicate = rep(c('r1','r2','r1','r2'), 2) ) dds <- DESeqDataSetFromMatrix(countData = all_counts, colData = colData, design = ~ type + condition + type:condition) dds <- DESeq(dds) # The interaction term `typeip.conditiontreat` tests: # does the IP/input ratio differ in treatment vs control? res <- results(dds, name = 'typeip.conditiontreat') # Filter res_sig <- res[!is.na(res$padj) & res$padj < 0.05 & abs(res$log2FoldChange) > 1, ] ``` The interaction-term design (`type:condition`) is the correct statistical model for differential CLIP: it tests whether the IP-vs-input ratio differs across conditions, which is what "differential binding" means. Naive testing of just `condition` (ignoring SMInput) confounds binding changes with expression changes. ## RBP Knockdown Experiment Design The canonical differential CLIP design is to knock down the RBP and observe what binding sites are lost. Caveats: | Issue | Implication | Mitigation | |-------|-------------|------------| | RBP KD also depletes the RBP protein in cells | siRNA/shRNA reduces RBP -> reduces IP yield -> reduces unique fragments | Normalize against SMInput WITHIN each condition; the relative IP/SMInput captures binding, not protein level | | RBP KD changes transcript abundance | mRNA stability regulators (HuR, PUM2) when knocked down change target abundance | Both IP and SMInput see the change; ratio still works | | Off-target effects of siRNA | Multiple binding profiles change | Use multiple independent siRNAs; require concordance | | KD efficiency varies | Lower KD -> smaller binding-loss signal | Validate KD by WB on the same IP lysate; > 70% protein loss target | | Rescue requires re-introducing RBP | siRNA-resistant RBP cDNA for rescue | The standard differential validation experiment | ## Per-Tool Failure Modes ### DEWSeq -- Slow on dense libraries **Trigger:** Whole-genome window tiling at 20 nt step; dense library (50M unique fragments); 4+ samples. **Mechanism:** DEWSeq runs DESeq2 internals on millions of windows; the dispersion fit on this many features is slow. **Symptom:** Runtime > 6 h; out-of-memory; "size of object exceeds vector limit". **Fix:** Increase window step size to 50 nt; pre-filter windows with low counts; or restrict to expressed transcripts only. DEWSeq vignette suggests `keep <- rowSums(counts(dds)) >= 30; dds <- dds[keep,]` before testing. ### DEWSeq -- Custom adjacency aggregation needed **Trigger:** Windows are 50 nt; biological binding sites are 50-500 nt; user expects DEWSeq to output continuous "differential regions" but gets individual windows. **Mechanism:** DEWSeq outputs per-window results; aggregating adjacent significant windows into regions is a separate step. **Symptom:** Output has 10,000 individual windows; user expects 1,000 biological regions. **Fix:** Use the DEWSeq utility `resultsDEWSeq()` then `bedtools merge -d 100` on the significant-window BED. Or use the `top_hits_to_bed.R` script from DEWSeq examples. ### Peak-level differential -- Peak set differs between conditions **Trigger:** CLIPper called peaks separately per condition; treatment has peaks at sites missing in control (and vice versa). **Mechanism:** Peak unification requires a consensus peakset; testing on a "treatment-only" peak underestimates evidence in control (zero reads) and produces spurious DE. **Symptom:** "Treatment-specific" peaks dominate DE results; biologically implausible. **Fix:** Generate consensus peakset across all conditions (bedtools merge of all per-condition peak BEDs); count reads per consensus peak across all samples; THEN run differential. The Yeo lab convention is consensus peakset across all samples. ### Interaction term forgotten **Trigger:** DESeq2 design `~ condition` instead of `~ type + condition + type:condition`. **Mechanism:** Simple `~ condition` tests whether read counts differ between treatment and control regardless of whether reads are from IP or SMInput. A condition-driven expression change in SMInput is detected as DE binding. **Symptom:** DE results dominated by transcripts with global expression changes (housekeeping shifts). **Fix:** Always use the interaction-term design. The biologically meaningful test is the interaction `type:condition` p-value. ### Normalization assumptions **Trigger:** edgeR `calcNormFactors(method='TMM')` on CLIP-seq data. **Mechanism:** TMM assumes most features (genes) are not differentially expressed. CLIP-seq peak counts can be globally shifted if the RBP itself is knocked down; TMM normalization would force the shift to be invisible. **Symptom:** Knockdown experiment shows ~0 DE peaks; expected hundreds. **Fix:** Use SMInput as the control library; spike-in normalization if available; or skip TMM and use library-size normalization only. Some CLIP-specific tools (DEWSeq, Flipper) handle this internally. ### Flipper requires Skipper upstream **Trigger:** Flipper called on CLIPper output. **Mechanism:** Flipper expects Skipper's window-level output format with beta-binomial estimates. **Symptom:** Flipper crashes or produces nonsense. **Fix:** Use DEWSeq with htseq-clip for CLIPper-upstream workflows; use Flipper only with Skipper. ## Decision Tree by Scenario | Scenario | Tool + design | Why | |----------|---------------|-----| | KD vs control eCLIP, CLIPper upstream | DEWSeq + htseq-clip | CLIP-specific NB GLM with interaction term | | KD vs control eCLIP, Skipper upstream | Flipper | Skipper companion; matches windowing | | Treatment vs vehicle (small effect) | DEWSeq (window-level higher power) | Sliding windows capture small shifts | | Multiple time points | DEWSeq with time as covariate | Continuous design with time vector | | Allele-specific differential | BEAPR per-allele + custom logistic | See clip-seq/clip-alignment for WASP | | Single-cell CLIP differential | MASTR-seq or scCLAM | Specialized; few options | | Differential motif occupancy | Window-level + DEWSeq + motif overlap | Combine differential windows with motif BED | | RBP overexpression vs control | Same as KD reversed | Same statistical framework | | Compare two RBPs | NOT differential CLIP; use SPIDR or separate CLIPs | Different RBPs need separate IPs | | Spike-in normalization needed | DEWSeq + spike-in size factors | For global occupancy shifts | | chimeric eCLIP differential miRNA targets | Custom; treat each miRNA-target chimera as feature | Specialized; see clip-seq/ago-clip-mirna-targets | ## Reconciliation: When Differential Tools Disagree | Pattern | Likely cause | Action | |---------|--------------|--------| | DEWSeq finds many DE windows; edgeR peak-level finds few | Window-level higher power for narrow shifts | Aggregate DEWSeq windows; cross-check | | edgeR many DE; DEWSeq few | edgeR not accounting for SMInput | Re-run edgeR with interaction term | | DE peaks dominated by expression changes | No interaction term | Use `~ type + condition + type:condition` | | KD experiment shows ~0 DE | TMM over-corrects global shift | Switch to library-size norm only; use SMInput | | siRNA replicates discordant | Off-target effects vary | Use multiple independent siRNAs; require concordance | | Treatment-only peaks dominate DE | No consensus peakset | Generate consensus first; then test on unified set | | Significant windows scattered | Window aggregation step skipped | `bedtools merge -d 100` on significant-window BED | | Same gene appears in many DE windows | Multiple binding sites per gene differential | Report at gene-level too; not just window-level | | Flipper fails with non-Skipper input | Upstream mismatch | Use DEWSeq for non-Skipper workflows | | DESeq2 dispersion fit fails | Too few replicates (n=2 per condition); too few features after filtering | Increase replicates; or relax filtering | **Operational rule for high-confidence differential reporting:** (a) Use SMInput-aware design (`~ type + condition + type:condition`); (b) generate consensus peakset across conditions; (c) require padj < 0.05 AND |log2FC| > 1; (d) require concordance with at least one orthogonal method (e.g., DEWSeq + edgeR peak-level on same data); (e) for KD experiments, validate KD efficiency by WB and require multiple independent siRNAs. ## Common Errors | Error / symptom | Cause | Solution | |-----------------|-------|----------| | DESeq2 `~ condition` finds many DE | No interaction with type | Use `~ type + condition + type:condition` and test interaction | | DEWSeq output is gene-level not window-level | `keep <- ...` filter too aggressive | Loosen pre-filter | | Adjacent significant windows not aggregated | Forgot bedtools merge | `bedtools merge -d 100` on sig windows | | edgeR TMM fits all libraries to one value | Most-features-not-DE assumption violated | Use library-size norm; or DEWSeq for CLIP-specific | | Flipper crashes on CLIPper input | Tool mismatch | Switch to DEWSeq | | Few replicates -> unstable estimates | n=2 not enough for dispersion | Increase n; or use limma-voom (more tolerant) | | Peaks differ across conditions | Per-condition peak calls | Unify with consensus peakset | | KD experiment yields 0 DE | Normalization over-corrected; OR KD efficiency too low | Validate KD WB; check normalization | | Global shift in IP relative to SMInput | RBP itself knocked down so IP yield lower | Normalize WITHIN each condition | | Lots of "treatment-only" peaks | Caller stringency higher in one condition | Use consensus peakset for fairness | ## References - Schwarzl T et al 2020 Bioconductor pkg DEWSeq (windowed CLIP differential) - Sahadevan S et al 2022 Methods Mol Biol 2404:189 (DEWSeq + htseq-clip pipeline) - Tu Y et al 2024 PMC PMC13060198 (Flipper, Skipper-companion differential) - Boyle EA et al 2023 Cell Genomics 3:100317 (Skipper, parent of Flipper) - Love MI et al 2014 Genome Biol 15:550 (DESeq2) - McCarthy DJ et al 2012 Nucleic Acids Res 40:4288 (edgeR) - Ritchie ME et al 2015 Nucleic Acids Res 43:e47 (limma) - Wu B et al 2018 Nat Commun 9:5117 (BEAPR allele-specific differential) - Van Nostrand EL et al 2020 Nature 583:711 (ENCODE 150 RBP shRNA + eCLIP comparison) ## Related Skills - clip-seq/clip-peak-calling - CLIPper / Skipper outputs feed differential - clip-seq/clip-qc - Replicate QC required for valid differential - clip-seq/binding-site-annotation - Annotate differential regions - clip-seq/clip-motif-analysis - Motif analysis on differential windows - clip-seq/ago-clip-mirna-targets - Differential miRNA targeting from chimeric eCLIP - differential-expression/deseq2-basics - Underlying NB GLM model - differential-expression/de-results - DE results interpretation - differential-expression/edger-basics - edgeR for peak counts - chip-seq/differential-binding - DNA-protein analogue