Tools and versions

Quality-control assessment tools

The chrom-seek pipeline runs a series of quality-control (QC) tools to assess the sequencing and library quality of each sample. The tools used in the pipeline are listed below, along with their versions and a brief description of their purpose.

Tool Version Notes
FastQC1 0.11.9 Assess sequencing quality, run before and after adapter trimming
Kraken2 2.1.2 Assess microbial taxonomic composition
KronaTools3 2.8.1 Visualize kraken output
FastQ Screen4 0.9.3 Assess contamination; additional dependencies: bowtie2/2.5.1, perl/5.36
Preseq5 3.1.2 Estimate library complexity
MultiQC6 1.14 Aggregate sample statistics and quality-control information across all samples


Data processing tools

The pipeline is composed of a series of data processing steps that include adapter trimming, read alignment, duplicate removal, and bigwig creation. The tools used in the pipeline are listed below, along with their versions and a brief description of their purpose.

Tool Version Notes
Cutadapt1 4.4 Remove adapter sequences and perform quality trimming
BWA mem2 0.7.17 Read alignment, first to identify reads aligning to blacklisted regions and later for the remainder of the genome
Picard3 2.27.3 Run SamToFastq (for blacklist read removal) and MarkDuplicates (to remove PCR duplicates in PE data)
SAMtools4 1.17 Remove reads with mapQ less than 6. Also run flagstat and idxstats to calculate alignment statistics.
MACS5 Run filterdup on SE data (--keep-dup="auto") to remove PCR duplicates
Bedtools6 2.27.1 Run intersect and bedtobam to convert .tag.Align.gz to .bam for use with deeptools (specific to SE data) and MEME
ppqt7,8 1.1.2 Also known as phantompeakqualtools, used to calculate estimated fragment length (used for bigwig and peak calling for SE data). Also produces QC metrics: NSC and RSC.
deepTools9 3.5.1 Used for bigwig creation and multiple QC metrics. Use bamcoverage to create RPGC-normalized data: --binSize 25 --smoothLength 75 --normalizeUsing RPGC. For PE data, add --centerReads. For control SE, add -e 200. For ChIP SE, add -e [estimated fragmentlength]. For control subtraction (inputnorm), use bigwigCompare: --binSize 25 --operation 'subtract'. Run multiBigWigSummary, plotCorrelation, plotPCA, plotFingerprint, computeMatrix, plotHeatmap, and plotProfile for QC plots. Note: not all these have been reincorporated into the pipeline.


Peak calling and differential peak calling tools

The pipeline includes peak calling and differential peak calling tools to identify enriched regions of interest in the genome. The tools used in the pipeline are listed below, along with their versions and a brief description of their purpose.

Tool Version Notes
MACS1 macsNarrow: The macs2 caller optimized for narrow peaks, widely recognized as the most popular peak calling algorithm. Typically used in large databases, it identifies peaks within the range of 150bp to 10kb. Originally designed to handle peaks with a single maxima/summit, its false discovery rate (FDR) has been greatly improved with the addition of an "input" control. It is generally more accurate than most other peak callers, even without controls. macsBroad: The macs2 caller for slightly broader peaks, sharing a similar algorithm with macsNarrow. It is particularly useful when peaks exhibit more than one maxima/summit.
Sicer2 2-1.0.3 Sicer is a broad peak caller that can be highly effective for certain histone marks. However, it may not perform well for extra broad domains such as lamins or some repressive marks. It allows for a small amount of gaps between peaks, and users may need to adjust window and gap parameters for optimal results.
Genrich3 0.6 Designed with ATAC-seq data in mind, Genrich can yield excellent results. However, it may not be universally favored by all collaborators due to its lack of formal publication or review.
MANorm4 1.1.4 Differential peak calling when no replicates. This tool has not been incorporated into the pipeline.
DiffBind5,6 2.15.2 Used for conducting differential peak calling analyses, this tool integrates with Deseq2 and EdgeR for analysis. Here is a link to DiffBind's documentation.


Annotations, motifs, and QC metrics

The pipeline includes tools for peak annotation, motif calling, and quality-control metrics. The tools used in the pipeline are listed below, along with their versions and a brief description of their purpose.

Tool Version Notes
Uropa1 4.0.2 Uropa is utilized for peak annotations, providing comprehensive annotation features. Here is a link to Uropa's documentation. See the glossary for options in this pipeline.
Homer2 4.11.1 Homer is being used for motif calling
MEME3 5.5.5 Employed for motif analysis, the MEME suite includes MEME-ChIP for de novo motif discovery and AME for known motif analysis. Note that the Centrimo subcomponent of MEME-ChIP may produce inaccurate results for broad peak calling tools. Here is a link to MEME suite'sdocumentation.
IDR4 2.0.3 One method for identifying consensus peaks. Only works for 2 replicates. This tool is not currently in the pipeline.
Jaccard NA Calculation of peak call consistency between two conditions. This is currently is not included in the pipeline. Requires: pybedtools pysam.
FRiP NA Represents the "fraction of reads in peaks" calculated as the proportion of aligned reads falling within the peaks called by specific tool for a given sample. Requires: pybedtools pysam


Last update: 2025-01-24
