estimateReadFiltering (version 3.5.4+galaxy0)

Sample order matters:

By default, the order of samples given to the program is dependent on their order in your history. If the order of the samples is vital to you, select Yes below.

BAM/CRAM file:

Would you like custom sample labels?:

By default, the names of the samples in your history are used.

Bin size in bp:

Length in bases of the window used to sample the genome. (--binSize)

Distance between bins:

To reduce the computation time, not every possible genomic bin is sampled. This option allows you to set the distance between bins actually sampled from. Larger numbers are sufficient for high coverage samples, while smaller values are useful for lower coverage samples. Note that if you specify a value that results in too few (<1000) reads sampled, the value will be decreased.

Only include reads originating from fragments from the forward or reverse strand.:

By default (the no option), all reads are processed, regardless of the strand they originated from. For RNAseq, it can be useful to separately create bigWig files for the forward or reverse strands. Note that this tools assumes that a dUTP-based method was used, so fragments will be assigned to the reverse strand if the second read in a pair is reverse complemented.

Ignore duplicates:

If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate position also has to coincide to ignore a read.

Minimum mapping quality:

If set, only reads with a mapping quality score at least this high are considered.

Include reads based on the SAM flag:

For example, to get only reads that are the first mate use a flag of 64. This is useful to count properly paired reads only once, otherwise the second mate will be also considered for the coverage.

Exclude reads based on the SAM flag:

For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand.

Blacklisted regions in BED/GTF format:

One or more files containing regions to exclude from the analysis

What it does

This tool estimates the number of alignments that would be excluded from one or more BAM files given a variety of filtering criteria. This is useful for estimating the duplication rate in an experiment or more generally seeing what the effect of various option choices will be in other deepTools tools without actually spending the time to run them.

Output

The output file is a simple text file with the following columns:

Total reads (including unmapped)

Mapped reads

Reads in blacklisted regions (--blackListFileName)

The following metrics are estimated according to the --binSize and --distanceBetweenBins parameters

Estimated mapped reads filtered (the total number of mapped reads filtered for any reason)
Alignments with a below threshold MAPQ (--minMappingQuality)
Alignments with at least one missing flag (--samFlagInclude)
Alignments with undesirable flags (--samFlagExclude)
Duplicates determined by deepTools (--ignoreDuplicates)
Duplicates marked externally (e.g., by picard)
Singletons (paired-end reads with only one mate aligning)
Wrong strand (due to --filterRNAstrand)

The sum of these may be more than the total number of reads. Note that alignments are sampled from bins of size --binSize spaced --distanceBetweenBins apart.

For more information on the tools, please visit our help site.

For support or questions please post to Biostars. For bug reports and feature requests please open an issue on github.

This tool is developed by the Bioinformatics and Deep-Sequencing Unit at the Max Planck Institute for Immunobiology and Epigenetics.