--- name: bio-chipseq-visualization description: Visualizes ChIP-seq data using deepTools (computeMatrix, plotHeatmap, plotProfile, bamCoverage, bamCompare), pyGenomeTracks (modern INI-driven track plots), Gviz (R browser-style), EnrichedHeatmap (ComplexHeatmap-based), ChIPseeker tag heatmaps, and IGV batch screenshots. Handles bigWig normalization choices (CPM, BPM, RPGC, spike-in scaled), bamCompare operations (log2 ratio, subtract, SES), k-means clustering of heatmaps for biological subgrouping, and spike-in-scaled tracks for global-shift experiments. Use when generating publication-quality ChIP-seq signal heatmaps, profile plots, genome-browser tracks, or comparing samples visually. tool_type: mixed primary_tool: deepTools goal_approach_exempt: true --- ## Version Compatibility Reference examples tested with: deepTools 3.5+, pyGenomeTracks 3.9+, Gviz 1.46+, EnrichedHeatmap 1.32+, ChIPseeker 1.38+, IGV 2.17+, samtools 1.19+, bedtools 2.31+. # ChIP-seq Visualization **"Visualize ChIP-seq signal around features of interest"** -> Generate normalized signal tracks (bigWig), heatmaps centered on TSS/peaks, average profile plots, and genome-browser views — with normalization that supports the biological claim (within-sample vs cross-sample vs spike-in scaled). - CLI (production): deepTools `bamCoverage` -> `computeMatrix` -> `plotHeatmap` / `plotProfile` - CLI (config-driven tracks): pyGenomeTracks (replaces Gviz for many use cases) - R (publication): Gviz, EnrichedHeatmap, ChIPseeker tag heatmaps - GUI: IGV with batch scripts for reproducible screenshots The single most consequential choice is **bigWig normalization** — it determines whether visual comparison reflects biology. Get this right before generating any heatmap or browser view. ## bigWig Normalization Decision Tree | Goal | Method | When to use | |------|--------|------------| | Within-sample profile of a single ChIP | `--normalizeUsing CPM` | Standard; reads per million; comparable within one library | | Within-sample, length-aware | `--normalizeUsing BPM` | TPM-analog; useful for variable-width regions; less common for ChIP-seq | | Cross-sample with equal effective depth | `--normalizeUsing RPGC --effectiveGenomeSize ` | "1x genome coverage" — assumes equal sequencing genome-wide; ENCODE convention | | Cross-condition with global signal change | `--scaleFactor ` (skip `--normalizeUsing`) | HDACi / BETi / EZH2i; see chip-seq/spike-in-normalization | | ChIP vs input ratio | `bamCompare --operation log2` | Visualize enrichment over input | | ChIP vs input control-subtracted | `bamCompare --operation subtract` | Absolute signal above background | | ChIP vs input SES-corrected | `bamCompare --operation SES` | More robust to library size; uses signal-extraction-scaling | **ENCODE convention:** RPGC with read-length-matched effective genome size. For visual comparison of treatment vs control on a fold-change biology, log2 bamCompare against shared input. **Spike-in scaled tracks (the right way):** ```bash # Compute scale factor from spike-in reads (ChIP-Rx Drosophila or CUT&RUN E. coli) SCALE=$(echo "scale=6; 1.0 / $SPIKE_IN_READS_M" | bc) # 1 per million spike reads bamCoverage -b chip.bam -o chip.bw --scaleFactor $SCALE --binSize 10 # DO NOT also pass --normalizeUsing; mutually exclusive ``` ## deepTools Workflow ### bigWig generation ```bash # Standard within-sample (CPM) bamCoverage -b chip.bam -o chip.bw \ --normalizeUsing CPM --binSize 10 \ --extendReads 200 --numberOfProcessors 8 # Cross-sample at 1x genome coverage (ENCODE) bamCoverage -b chip.bam -o chip.bw \ --normalizeUsing RPGC --effectiveGenomeSize 2701495761 \ --binSize 10 --extendReads 200 # ChIP vs Input log2 ratio (visualization of enrichment) bamCompare -b1 chip.bam -b2 input.bam -o chip_vs_input.bw \ --operation log2 --binSize 50 --extendReads 200 \ --pseudocount 1 --skipZeroOverZero ``` ### Signal matrix and heatmap (reference-point: TSS / peak summit) ```bash # Compute matrix centered on TSS computeMatrix reference-point \ --referencePoint TSS \ -b 3000 -a 3000 \ -R genes.bed \ -S chip.bw input.bw \ -o matrix.gz \ --outFileSortedRegions sorted_regions.bed \ --numberOfProcessors 8 \ --skipZeros # Heatmap with k-means clustering (biology emerges from clusters) plotHeatmap -m matrix.gz \ -o heatmap.pdf \ --kmeans 3 \ --colorMap RdBu_r \ --zMin -3 --zMax 3 \ --refPointLabel TSS \ --heatmapHeight 12 \ --whatToShow 'heatmap and colorbar' # Profile plot (average signal across regions) plotProfile -m matrix.gz \ -o profile.pdf \ --perGroup \ --plotTitle 'H3K4me3 around TSS' ``` ### Scale-regions (gene-body scaled to common length) ```bash computeMatrix scale-regions \ -R genes.bed \ -S chip.bw \ -b 3000 -a 3000 \ -m 5000 \ -o matrix_genebody.gz \ --numberOfProcessors 8 plotProfile -m matrix_genebody.gz -o genebody_profile.pdf --perGroup ``` ### Sample correlation ```bash multiBamSummary bins -b sample1.bam sample2.bam sample3.bam \ --binSize 10000 -o results.npz \ --numberOfProcessors 8 plotCorrelation -in results.npz \ --corMethod spearman \ --whatToPlot heatmap \ --plotNumbers -o correlation.pdf \ --outFileCorMatrix correlation.tab # Replicates should correlate > 0.8 (narrow), > 0.6 (broad) ``` ## pyGenomeTracks (Modern Browser-Style Plotting) INI-driven, config-as-code; better than Gviz for complex layouts or pipeline integration. ```ini # tracks.ini [x-axis] [chip-h3k27ac] file = h3k27ac.bw color = darkblue height = 3 title = H3K27ac [chip-h3k4me3] file = h3k4me3.bw color = darkred height = 3 title = H3K4me3 [peaks-narrowpeak] file = peaks.narrowPeak file_type = narrow_peak color = black height = 0.5 title = MACS peaks [se-bed] file = super_enhancers.bed color = orange height = 0.5 title = Super-enhancers [genes] file = genes.gtf color = darkgreen prefered_name = gene_name height = 4 ``` ```bash pyGenomeTracks --tracks tracks.ini --region chr1:1000000-1500000 -o region.pdf ``` For pipeline-driven figure generation across multiple regions, pyGenomeTracks is easier to script than Gviz. For one-off publication figures with complex annotation, Gviz remains useful. ## R: Gviz and EnrichedHeatmap ```r library(Gviz) library(GenomicRanges) library(TxDb.Hsapiens.UCSC.hg38.knownGene) chr <- 'chr1'; start <- 1e6; end <- 1.1e6 itrack <- IdeogramTrack(genome = 'hg38', chromosome = chr) gtrack <- GenomeAxisTrack() dtrack <- DataTrack(range = 'sample.bw', genome = 'hg38', type = 'histogram', name = 'ChIP', col.histogram = 'darkblue') grtrack <- GeneRegionTrack(TxDb.Hsapiens.UCSC.hg38.knownGene, genome = 'hg38', chromosome = chr, name = 'Genes') plotTracks(list(itrack, gtrack, dtrack, grtrack), from = start, to = end, chromosome = chr) ``` ```r library(EnrichedHeatmap) library(rtracklayer) # Normalize bigWig signal to a matrix around target sites signal <- import('sample.bw') tss <- promoters(txdb, upstream = 0, downstream = 1) mat <- normalizeToMatrix(signal, tss, extend = 3000, mean_mode = 'w0', w = 50) # Heatmap with customization EnrichedHeatmap(mat, name = 'Signal', col = c('white', 'red'), top_annotation = HeatmapAnnotation(lines = anno_enriched())) ``` ## ChIPseeker Tag Heatmap (R) ```r library(ChIPseeker) library(TxDb.Hsapiens.UCSC.hg38.knownGene) peaks <- readPeakFile('peaks.narrowPeak') promoter <- getPromoters(TxDb = TxDb.Hsapiens.UCSC.hg38.knownGene, upstream = 3000, downstream = 3000) tagMatrix <- getTagMatrix(peaks, windows = promoter) # Tag heatmap and average profile tagHeatmap(tagMatrix, xlim = c(-3000, 3000), color = 'red') plotAvgProf(tagMatrix, xlim = c(-3000, 3000), conf = 0.95, xlab = 'Distance from TSS (bp)', ylab = 'Peak density') ``` ## IGV Batch Scripts ```bash # IGV batch script for reproducible screenshots cat > igv.batch << 'EOF' new genome hg38 load chip.bw load peaks.bed load super_enhancers.bed goto chr1:1000000-1100000 snapshot region1.png goto chr2:50000000-51000000 snapshot region2.png exit EOF igv.sh -b igv.batch ``` ## Per-Tool Failure Modes ### bamCoverage -- `--normalizeUsing` and `--scaleFactor` conflict **Trigger:** Passing both `--normalizeUsing CPM` and `--scaleFactor X`. **Mechanism:** deepTools applies scaleFactor first, then normalizes; the normalization undoes the scale. **Symptom:** Spike-in scaling appears to have no effect; tracks look like CPM. **Fix:** Use ONE — `--scaleFactor` alone for spike-in; `--normalizeUsing` alone otherwise. Never both. ### bamCompare -- log2 with zeros produces -Inf **Trigger:** `bamCompare --operation log2` without pseudocount; many bins have zero reads. **Mechanism:** log2(0/x) = -Inf; downstream tools (plotHeatmap) may color these as NaN or fail. **Fix:** Add `--pseudocount 1` to both samples; or use `--skipZeroOverZero` to skip bins with zero in both samples. ### computeMatrix -- Stranded bigWig vs unstranded reference points **Trigger:** Using stranded bigWigs (separate plus/minus) with `reference-point` mode on a BED without strand info. **Mechanism:** computeMatrix doesn't auto-detect strand; signal is plotted in genomic-strand orientation, breaking TSS-centered plots. **Fix:** Use unstranded merged bigWig OR ensure BED has strand column 6. ### plotHeatmap `--kmeans` -- Order depends on first sample only **Trigger:** Using k-means with multiple samples and expecting consistent clustering. **Mechanism:** k-means clusters by signal in the first `-S` bigWig only; other samples are plotted in the same row order. **Fix:** Order samples in `-S` so the most-discriminating one is first; for combined clustering across samples, use `--hclust` or run k-means externally on combined matrix. ### Spike-in scaled bigWig -- Wrong scale factor direction **Trigger:** Computing `scale_factor = spike_reads / 1e6` and passing to `--scaleFactor`. **Mechanism:** deepTools multiplies signal by scaleFactor; the INVERSE is correct (sample with fewer spike reads gets larger scale factor to compensate). **Symptom:** Treatment samples appear lower than control even when biology says higher. **Fix:** `scale_factor = MIN(spike_reads_all_samples) / spike_reads_this_sample`. Always verify against known internal-control regions (blacklist should show no signal change post-scaling). ### Gviz / EnrichedHeatmap -- Memory failure on whole-genome bigWigs **Trigger:** Loading a 3 GB bigWig into R as a GRanges. **Mechanism:** Gviz loads the entire bigWig into memory for genome-wide views. **Fix:** Use `chromosome` parameter to restrict; use `import.bw(con, which = GRanges(...))` to subset; consider pyGenomeTracks for whole-chromosome views. ### pyGenomeTracks -- INI parsing strict **Trigger:** Custom INI keys not recognized; or section names with spaces. **Mechanism:** pyGenomeTracks expects exact key names; case-sensitive section labels. **Fix:** Run `make_tracks_file --trackFiles sample.bw -o tracks.ini` to generate a template; modify from there. ## Reconciliation: When Visualizations Disagree | Pattern | Likely cause | Action | |---------|--------------|--------| | Heatmap shows enrichment; profile plot doesn't | Signal concentrated at few regions; profile averages them out | Both correct; heatmap shows distribution, profile shows central tendency | | Replicate heatmaps differ at peak edges | Different normalization or stranded vs unstranded bigWigs | Verify bigWig parameters identical; use same `--normalizeUsing` | | Spike-in scaled tracks show opposite trend from CPM | Global shift; CPM forces median to control levels | Spike-in is correct; CPM is fooled by composition | | ChIPseeker tag heatmap differs from deepTools heatmap | ChIPseeker uses peak density; deepTools uses signal coverage | Different metrics; pick one per analysis | | Profile plot loose-replicate band wide | Genuine biological variability OR one replicate failed | Check per-replicate metrics (chipseq-qc); don't average across failing rep | ## Common Errors | Error / symptom | Cause | Solution | |-----------------|-------|----------| | bigWig has all zeros | Wrong chromosome naming (chr vs no chr) | `samtools view -H bam | head` to check; convert if needed | | computeMatrix "all regions skipped" | BED chromosome naming mismatches bigWig | Match seqlevels | | plotHeatmap colors compressed | `--zMin/--zMax` not set; outliers dominate | Set `--zMin -3 --zMax 3` or use percentile-based | | IGV batch hangs | `exit` command missing; IGV waits for input | Always end batch script with `exit` | | pyGenomeTracks region out of range | Region exceeds chromosome length | Verify region from `samtools view -H bam` | | Spike-in scaled track has artifact stripes | Scale factor too extreme (>10x) | Verify spike-in reads adequate (>100k); check titration | ## References - Ramírez F et al 2016 Nucleic Acids Res 44:W160 (deepTools) - Lopez-Delisle L et al 2021 Bioinformatics 37:422 (pyGenomeTracks) - Hahne F & Ivanek R 2016 Methods Mol Biol 1418:335 (Gviz) - Gu Z et al 2018 Bioinformatics 34:2879 (EnrichedHeatmap) - Yu G et al 2015 Bioinformatics 31:2382 (ChIPseeker) - Thorvaldsdóttir H et al 2013 Brief Bioinform 14:178 (IGV) - ENCODE 2012 quality metrics (NSC/RSC; for cross-correlation context) ## Related Skills - chip-seq/peak-calling - Peak files for heatmap reference regions - chip-seq/chipseq-qc - QC plots (fingerprint, correlation) complement visualization - chip-seq/spike-in-normalization - Spike-in-scaled bigWig generation - chip-seq/differential-binding - Visualize differential peak signal - chip-seq/super-enhancers - SE region visualization in tracks - data-visualization/genome-tracks - General genome track patterns + IGV batch + pyGenomeTracks - data-visualization/heatmaps-clustering - General heatmap conventions