What it does
This tool will fetch flanking regions around STRs from the reads output by "STR detection" step, screen for quality score at STRs and adjacent flanking regions, and output two fastq files containing flanking regions in forward-forward direction.
Citation When you use this tool, please cite Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research
Input
The input file needs to be in the same format as output from STR detection step. This format contains length of repeat, length of left flanking region, length of right flanking region, repeat motif, hamming (editing) distance, read name, read sequence, read quality score
Output
The output will be two fastq files. The first file contains left flanking bases. The second file contains right flanking bases.
Example
Starting with this test input
6 40 54 G 0 SRR345592.75000006 HS2000-192_107:1:63:5822:176818_1_per1_1 TACCCTCCTGTCTTCCCAGACTGATTTCTGTTCCTGCCCTggggggTTCTTGACTCCTCTGAATGGGTACGGGAGTGTGGACCTCAGGGAGGCCCCCTTG GGGGGGGGGGGGGGGGGFGGGGGGGGGFEGGGGGGGGGGG?FFDFGGGGGG?FFFGGGGGDEGGEFFBEFCEEBD@BACB*?=99(/=5'6=4:CCC*AA
If we want to get fastq files of flanking regions around the detected STRs with quality score of at least 20, the program will report these two fastq files
@SRR345592.75000006 HS2000-192_107:1:63:5822:176818_1_per1_1 TACCCTCCTGTCTTCCCAGACTGATTTCTGTTCCTGCCCT +SRR345592.75000006 HS2000-192_107:1:63:5822:176818_1_per1_1 GGGGGGGGGGGGGGGGGFGGGGGGGGGFEGGGGGGGGGGG @SRR345592.75000006 HS2000-192_107:1:63:5822:176818_1_per1_1 TTCTTGACTCCTCTGAATGGGTACGGGAGTGTGGACCTCAGGGAGGCCCCCTTG +SRR345592.75000006 HS2000-192_107:1:63:5822:176818_1_per1_1 GGGGG?FFFGGGGGDEGGEFFBEFCEEBD@BACB*?=99(/=5'6=4:CCC*AA