comparison microsatpurity.xml @ 2:d5ed5c2e25c3 draft

Uploaded
author arkarachai-fungtammasan
date Wed, 22 Apr 2015 12:48:40 -0400
parents 07588b899c13
children
comparison
equal deleted inserted replaced
1:f2bab38e3cbd 2:d5ed5c2e25c3
1 <tool id="microsatpurity" name="Select uninterrupted microsatellites" version="1.0.0"> 1 <tool id="microsatpurity" name="Select uninterrupted STRs" version="1.0.0">
2 <description> of a specific column</description> 2 <description> of a specific column</description>
3 <command interpreter="python">microsatpurity.py $input $period $column_n > $output </command> 3 <command interpreter="python">microsatpurity.py $input $period $column_n > $output </command>
4 4
5 <inputs> 5 <inputs>
6 <param name="input" type="data" label="Select input" /> 6 <param name="input" type="data" label="Select input" />
26 26
27 .. class:: infomark 27 .. class:: infomark
28 28
29 **What it does** 29 **What it does**
30 30
31 This tool is used to select only the uninterrupted microsatellites. Interrupted microsatellites (e.g. ATATATATAATATAT) or sequences of microsatellites with non-microsatellite parts (e.g. ATATATATATG) will be removed. 31 This tool is used to select only the uninterrupted STRs/microsatellites. Interrupted STRs (e.g. ATATATATAATATAT) or sequences of STRs with non-STR parts (e.g. ATATATATATG) will be removed.
32 32
33 For TRFM pipeline (profiling microsatellites in short read data), this tool can be used to avoid the cases that flanking bases were misread as microsatellite. Thus, the read profile will only reflect the variation of TR length from expansion/contraction. 33 As another application of this tool, specifically for STR-FM pipeline (profiling STRs in short read data), it can be used to avoid the cases where flanking bases were misread as STRs (sequencing errors). Thus, the remaining read profile will only reflect the variation of TR length from expansion/contraction.
34 For example, suppose that the sequence around microsatellite is AGCGACGaaaaaaGCGATCA. If we observe read with sequence AGCGACGaaaaaaaaaaGCGATCA, we can indicate that this is microsatellite expansion. However, if we observe AGCGACGaaaaaaaCGATCA, this is more like a substitution of G to A. These incidents can be removed with this tool. 34 For example, suppose that the sequence around an STR in the reference genome is AGCGACGaaaaaaGCGATCA. If we observe a read with sequence AGCGACGaaaaaaaaaaGCGATCA, we can indicate that this is an STR expansion. However, if we observe another read with sequence AGCGACGaaaaaaaCGATCA, this is likely a substitution of G to A. Such incidents can be removed with this tool.
35 You can use the tool **combine mapped flaked bases** to get the microsatellites in reference that correspond to sequence between mapped reads. If the user map these reads around the uninterrupted microsatelites in reference, the corresponding sequences between these pairs should be the uninterrupted microsatellites regardless of expansion/contraction of microsatellites in short read data. However, if the substitution of flanking base or if the fluorescent signal from the previous run make it look like substitution, the corresponding sequences in reference in between the pairs will not be uninterrupted microsatellites. Thus this tool can remove those cases and keep only microsatellite expansion/contraction. 35 You can use the tool **combine mapped flanking bases** to get the STRs in reference that correspond to sequence between mapped reads. If the user map these reads around the uninterrupted STRs in reference, the corresponding sequences between these pairs should be the uninterrupted STRs regardless of expansion/contraction of STRs in short read data. However, if the substitution of flanking base or if the fluorescent signal from the previous run make it look like substitution, the corresponding sequences in reference in between the pairs will not be uninterrupted STRs. Thus this tool can remove those cases and keep only STR expansion/contraction.
36 36
37 37
38 **Citation** 38 **Citation**
39 39
40 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research** 40 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research**
41 41
42 **Input** 42 **Input**
43 43
44 The input files can be any tab delimited file. 44 The input files can be any tab delimited file.
45 45
46 If this tool is used in TRFM microsatellite profiling, it should contains: 46 If this tool is used in STR-FM for STRs profiling, it should contains:
47 47
48 - Column 1 = microsatellite location in reference chromosome 48 - Column 1 = STR location in reference chromosome
49 - Column 2 = microsatellite location in reference start 49 - Column 2 = STR location in reference start
50 - Column 3 = microsatellite location in reference stop 50 - Column 3 = STR location in reference stop
51 - Column 4 = microsatellite location in reference motif 51 - Column 4 = STR location in reference motif
52 - Column 5 = microsatellite location in reference length 52 - Column 5 = STR location in reference length
53 - Column 6 = microsatellite location in reference motif size 53 - Column 6 = STR location in reference motif size
54 - Column 7 = length of microsatellites (bp) 54 - Column 7 = length of STR (bp)
55 - Column 8 = length of left flanking regions (bp) 55 - Column 8 = length of left flanking region (bp)
56 - Column 9 = length of right flanking regions (bp) 56 - Column 9 = length of right flanking region (bp)
57 - Column 10 = repeat motif (bp) 57 - Column 10 = repeat motif (bp)
58 - Column 11 = hamming distance 58 - Column 11 = hamming distance
59 - Column 12 = read name 59 - Column 12 = read name
60 - Column 13 = read sequence with soft masking of microsatellites 60 - Column 13 = read sequence with soft masking of STR
61 - Column 14 = read quality (the same Phred score scale as input) 61 - Column 14 = read quality (the same Phred score scale as input)
62 - Column 15 = read name (The same as column 12) 62 - Column 15 = read name (The same as column 12)
63 - Column 16 = chromosome 63 - Column 16 = chromosome
64 - Column 17 = left flanking region start 64 - Column 17 = left flanking region start
65 - Column 18 = left flanking region stop 65 - Column 18 = left flanking region stop
66 - Column 19 = microsatellite start as infer from pair-end 66 - Column 19 = STR start as infer from pair-end
67 - Column 20 = microsatellite stop as infer from pair-end 67 - Column 20 = STR stop as infer from pair-end
68 - Column 21 = right flanking region start 68 - Column 21 = right flanking region start
69 - Column 22 = right flanking region stop 69 - Column 22 = right flanking region stop
70 - Column 23 = microsatellite length in reference 70 - Column 23 = STR length in reference
71 - Column 24 = microsatellite sequence in reference 71 - Column 24 = STR sequence in reference
72 72
73 **Output** 73 **Output**
74 74
75 The same as input format. 75 The same as input format.
76 76