Galaxy | Tool Preview

Select uninterrupted STRs (version 1.0.0)

What it does

This tool is used to select only the uninterrupted STRs/microsatellites. Interrupted STRs (e.g. ATATATATAATATAT) or sequences of STRs with non-STR parts (e.g. ATATATATATG) will be removed.

As another application of this tool, specifically for STR-FM pipeline (profiling STRs in short read data), it can be used to avoid the cases where flanking bases were misread as STRs (sequencing errors). Thus, the remaining read profile will only reflect the variation of TR length from expansion/contraction. For example, suppose that the sequence around an STR in the reference genome is AGCGACGaaaaaaGCGATCA. If we observe a read with sequence AGCGACGaaaaaaaaaaGCGATCA, we can indicate that this is an STR expansion. However, if we observe another read with sequence AGCGACGaaaaaaaCGATCA, this is likely a substitution of G to A. Such incidents can be removed with this tool. You can use the tool combine mapped flanking bases to get the STRs in reference that correspond to sequence between mapped reads. If the user map these reads around the uninterrupted STRs in reference, the corresponding sequences between these pairs should be the uninterrupted STRs regardless of expansion/contraction of STRs in short read data. However, if the substitution of flanking base or if the fluorescent signal from the previous run make it look like substitution, the corresponding sequences in reference in between the pairs will not be uninterrupted STRs. Thus this tool can remove those cases and keep only STR expansion/contraction.

Citation

When you use this tool, please cite Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research

Input

The input files can be any tab delimited file.

If this tool is used in STR-FM for STRs profiling, it should contains:

Output

The same as input format.