comparison microsatcompat.xml @ 2:d5ed5c2e25c3 draft

Uploaded
author arkarachai-fungtammasan
date Wed, 22 Apr 2015 12:48:40 -0400
parents 07588b899c13
children
comparison
equal deleted inserted replaced
1:f2bab38e3cbd 2:d5ed5c2e25c3
1 <tool id="microsatcompat" name="Check microsatellites motif compatibility" version="1.0.0"> 1 <tool id="microsatcompat" name="Check STR motif compatibility between reference and read STRs" version="1.0.0">
2 <description> </description> 2 <description> </description>
3 <command interpreter="python">microsatcompat.py $input $column1 $column2 > $output </command> 3 <command interpreter="python">microsatcompat.py $input $column1 $column2 > $output </command>
4 4
5 <inputs> 5 <inputs>
6 <param name="input" type="data" label="Select input" /> 6 <param name="input" type="data" label="Select input" />
26 26
27 .. class:: infomark 27 .. class:: infomark
28 28
29 **What it does** 29 **What it does**
30 30
31 This tool is used to select only the input lines which have compatible microsatellite motifs between two columns. Compatible here is defined as the microsatellites motif that are complementary or have the same sequence when change starting point of motif. For example, **A** is the same as **T**. Also, **AGG** is the same as **GAG**. 31 This tool is used to select only those input lines that have compatible STR motifs between the two user-specified columns. Two STR motifs are called compatible if they are either identical, or complementary, or produce the same sequence on rotating the start of the motif. For example, **A** is considered compatible with **A** and its reverse complement **T**. Similarly, **AGG** considered compatible with **AGG**, its reverse complement **TCC**, and their rotations **GGA**, **GAG**, **CCT** and **CTC**.
32 32
33 For TRFM pipeline (profiling microsatellites in short read data), this tool can be used to make sure that the microsatellites in the reads have the same motif as the microsatellites in the reference at the corresponding mapped location. 33 For STR-FM pipeline (profiling STRs in short read data), this tool can be used to make sure that the STRs in the reads have the compatible motif as the STRs in the reference at the corresponding mapped location.
34 34
35 **Citation** 35 **Citation**
36 36
37 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research** 37 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research**
38 38
39 **Input** 39 **Input**
40 40
41 The input files can be any tab delimited file. 41 The input files can be any tab delimited file.
42 42
43 If this tool is used in TRFM microsatellite profiling, it should contains: 43 If this tool is used in STR-FM pipeline for STRs profiling, it should contains:
44 44
45 - Column 1 = microsatellite location in reference chromosome 45 - Column 1 = STR location in reference chromosome
46 - Column 2 = microsatellite location in reference start 46 - Column 2 = STR location in reference start
47 - Column 3 = microsatellite location in reference stop 47 - Column 3 = STR location in reference stop
48 - Column 4 = microsatellite location in reference motif 48 - Column 4 = STR location in reference motif
49 - Column 5 = microsatellite location in reference length 49 - Column 5 = STR location in reference length
50 - Column 6 = microsatellite location in reference motif size 50 - Column 6 = STR location in reference motif size
51 - Column 7 = length of microsatellites (bp) 51 - Column 7 = length of STR (bp)
52 - Column 8 = length of left flanking regions (bp) 52 - Column 8 = length of left flanking region (bp)
53 - Column 9 = length of right flanking regions (bp) 53 - Column 9 = length of right flanking region (bp)
54 - Column 10 = repeat motif (bp) 54 - Column 10 = repeat motif (bp)
55 - Column 11 = hamming distance 55 - Column 11 = hamming distance
56 - Column 12 = read name 56 - Column 12 = read name
57 - Column 13 = read sequence with soft masking of microsatellites 57 - Column 13 = read sequence with soft masking of STR
58 - Column 14 = read quality (the same Phred score scale as input) 58 - Column 14 = read quality (the same Phred score scale as input)
59 - Column 15 = read name (The same as column 12) 59 - Column 15 = read name (The same as column 12)
60 - Column 16 = chromosome 60 - Column 16 = chromosome
61 - Column 17 = left flanking region start 61 - Column 17 = left flanking region start
62 - Column 18 = left flanking region stop 62 - Column 18 = left flanking region stop
63 - Column 19 = microsatellite start as infer from pair-end 63 - Column 19 = STR start as infer from pair-end
64 - Column 20 = microsatellite stop as infer from pair-end 64 - Column 20 = STR stop as infer from pair-end
65 - Column 21 = right flanking region start 65 - Column 21 = right flanking region start
66 - Column 22 = right flanking region stop 66 - Column 22 = right flanking region stop
67 - Column 23 = microsatellite length in reference 67 - Column 23 = STR length in reference
68 - Column 24 = microsatellite sequence in reference 68 - Column 24 = STR sequence in reference
69 69
70 **Output** 70 **Output**
71 71
72 The same as input format. 72 The same as input format.
73 73