Galaxy | Tool Preview

footprint (version 1.0.0)
The version of genome against which the reads were aligned.
e.g. CTCF

Purpose

This is a pipeline to find transcription factor footprints in ATAC-seq or DNase-seq data.


Inputs

alignment bam file
  • A bam file from the ATAC-seq or DNase-seq experiment.
chromosome length
  • A tab delimited file with 2 columns.
  • The first column is the chromosome name and the second column is the chromosome length for the appropriate organism and genome build.
  • Example: chr1 10000000
coordinates of motif
  • A 6-column bed file with the coordinates of motif matches (eg resulting from scanning the genome with a PWM) for the transcription factor of interest.
  • The 6 columns should contain chromosome, start coordinate, end coordinate, name, score and strand information in this order. The coordinates should be closed (1-based).
  • Example: chr1 24782 24800 . 11.60 -
  • There should not be any additional columns.
transcription factor
  • The name of the transcription factor of interest supplied by the user, e.g. CTCF.
cleavage/transposition bias
  • The cleavage/transposition bias of the different protocols, for all 6-mers.
  • Provided options: ATAC, DNase double hit or DNase single hit protocols.
coordinates of ChIP-seq peaks
  • A file with the coordinates of the ChIP-seq peaks for the transcription factor of interest.
  • The format is flexible as long as the first 3 columns (chromosome, start coordinate, end coordinate) are present.
  • Example: chr1 237622 237882
number of components
  • Total number of footprint and background components that should be learned from the data.
  • Options are 2 (1 fp and 1 bg) and 3 (2 fp and 1 bg) components.
background components
  • The mode of initialization for the background component. Options are "Flat" or "Seq".
  • Choosing "Flat" initializes this component as a uniform distribution.
  • Choosing "Seq" initializes it as the signal profile that would be expected solely due to the protocol bias (given by the cleavage/transposition bias file).
fixed background component
  • Whether the background component should be kept fixed.
  • Options are TRUE or FALSE.
  • Setting "TRUE" keeps this component fixed, whereas setting "FALSE" lets it be reestimated during training.
  • In general, if the background is estimated from bias (option "Seq"), it is recommended to keep it fixed.

Outputs

results
  • The results of the footprinting analysis.
  • The first 6 columns harbor the motif information (identical to the 'coordinates of motif').
  • The 7th column has the footprint score (log-odds of footprint versus background) for each motif instance.
  • The following columns show the probabilities for the individual footprint and background components.
parameters
  • Gives the trained parameters for the footprint and background components.
  • It includes as many lines as components (eg the first line has the parameters for the first component).
plot 1
  • A plot with two panels, showing the initial components above and the final trained components below.
  • The plotted values for the final components are given in the 'parameters' output file explained above.
plot 2
  • A plot only with the final trained components.
  • In a model where 2 components are used, this plot is identical to the bottom panel in plot1.
  • When 3 components are used, this plot shows the weighted average of the 2 footprint components as the final footprint profile.