Purpose
This is a pipeline to find transcription factor footprints in ATAC-seq or DNase-seq data.
Inputs
- alignment bam file
- A bam file from the ATAC-seq or DNase-seq experiment.
- chromosome length
- A tab delimited file with 2 columns.
- The first column is the chromosome name and the second column is the chromosome length for the appropriate organism and genome build.
- Example: chr1 10000000
- coordinates of motif
- A 6-column bed file with the coordinates of motif matches (eg resulting from scanning the genome with a PWM) for the transcription factor of interest.
- The 6 columns should contain chromosome, start coordinate, end coordinate, name, score and strand information in this order. The coordinates should be closed (1-based).
- Example: chr1 24782 24800 . 11.60 -
- There should not be any additional columns.
- transcription factor
- The name of the transcription factor of interest supplied by the user, e.g. CTCF.
- cleavage/transposition bias
- The cleavage/transposition bias of the different protocols, for all 6-mers.
- Provided options: ATAC, DNase double hit or DNase single hit protocols.
- coordinates of ChIP-seq peaks
- A file with the coordinates of the ChIP-seq peaks for the transcription factor of interest.
- The format is flexible as long as the first 3 columns (chromosome, start coordinate, end coordinate) are present.
- Example: chr1 237622 237882
- number of components
- Total number of footprint and background components that should be learned from the data.
- Options are 2 (1 fp and 1 bg) and 3 (2 fp and 1 bg) components.
- background components
- The mode of initialization for the background component. Options are "Flat" or "Seq".
- Choosing "Flat" initializes this component as a uniform distribution.
- Choosing "Seq" initializes it as the signal profile that would be expected solely due to the protocol bias (given by the cleavage/transposition bias file).
- fixed background component
- Whether the background component should be kept fixed.
- Options are TRUE or FALSE.
- Setting "TRUE" keeps this component fixed, whereas setting "FALSE" lets it be reestimated during training.
- In general, if the background is estimated from bias (option "Seq"), it is recommended to keep it fixed.
Outputs
- results
- The results of the footprinting analysis.
- The first 6 columns harbor the motif information (identical to the 'coordinates of motif').
- The 7th column has the footprint score (log-odds of footprint versus background) for each motif instance.
- The following columns show the probabilities for the individual footprint and background components.
- parameters
- Gives the trained parameters for the footprint and background components.
- It includes as many lines as components (eg the first line has the parameters for the first component).
- plot 1
- A plot with two panels, showing the initial components above and the final trained components below.
- The plotted values for the final components are given in the 'parameters' output file explained above.
- plot 2
- A plot only with the final trained components.
- In a model where 2 components are used, this plot is identical to the bottom panel in plot1.
- When 3 components are used, this plot shows the weighted average of the 2 footprint components as the final footprint profile.