annotate test-data/microsatpurity.xml @ 4:ecfc9041bcc5

Deleted selected files
author arkarachai-fungtammasan
date Wed, 01 Apr 2015 14:05:54 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
1 <tool id="microsatpurity" name="Select uninterrupted microsatellites" version="1.0.0">
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
2 <description> of a specific column</description>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
3 <command interpreter="python">microsatpurity.py $input $period $column_n > $output </command>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
4
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
5 <inputs>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
6 <param name="input" type="data" label="Select input" />
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
7 <param name="period" type="integer" label="motif size" value="1"/>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
8 <param name="column_n" type="integer" value="0" label="Select column that contains microsatellites of interest (0 = last column)" />
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
9 </inputs>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
10 <outputs>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
11 <data format="tabular" name="output" />
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
12
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
13 </outputs>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
14 <tests>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
15 <!-- Test data with valid values -->
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
16 <test>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
17 <param name="input" value="microsatpurity_in.txt"/>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
18 <param name="period" value="2"/>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
19 <param name="column_n" value="0"/>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
20 <output name="output" file="microsatpurity_out.txt"/>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
21 </test>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
22
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
23 </tests>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
24 <help>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
25
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
26
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
27 .. class:: infomark
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
28
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
29 **What it does**
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
30
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
31 This tool is used to select only the uninterrupted microsatellites. Interrupted microsatellites (e.g. ATATATATAATATAT) or sequences of microsatellites with non-microsatellite parts (e.g. ATATATATATG) will be removed.
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
32
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
33 For TRFM pipeline (profiling microsatellites in short read data), this tool can be used to avoid the cases that flanking bases were misread as microsatellite. Thus, the read profile will only reflect the variation of TR length from expansion/contraction.
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
34 For example, suppose that the sequence around microsatellite is AGCGACGaaaaaaGCGATCA. If we observe read with sequence AGCGACGaaaaaaaaaaGCGATCA, we can indicate that this is microsatellite expansion. However, if we observe AGCGACGaaaaaaaCGATCA, this is more like a substitution of G to A. These incidents can be removed with this tool.
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
35 You can use the tool **combine mapped flaked bases** to get the microsatellites in reference that correspond to sequence between mapped reads. If the user map these reads around the uninterrupted microsatelites in reference, the corresponding sequences between these pairs should be the uninterrupted microsatellites regardless of expansion/contraction of microsatellites in short read data. However, if the substitution of flanking base or if the fluorescent signal from the previous run make it look like substitution, the corresponding sequences in reference in between the pairs will not be uninterrupted microsatellites. Thus this tool can remove those cases and keep only microsatellite expansion/contraction.
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
36
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
37
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
38 **Citation**
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
39
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
40 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research**
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
41
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
42 **Input**
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
43
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
44 The input files can be any tab delimited file.
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
45
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
46 If this tool is used in TRFM microsatellite profiling, it should contains:
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
47
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
48 - Column 1 = microsatellite location in reference chromosome
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
49 - Column 2 = microsatellite location in reference start
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
50 - Column 3 = microsatellite location in reference stop
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
51 - Column 4 = microsatellite location in reference motif
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
52 - Column 5 = microsatellite location in reference length
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
53 - Column 6 = microsatellite location in reference motif size
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
54 - Column 7 = length of microsatellites (bp)
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
55 - Column 8 = length of left flanking regions (bp)
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
56 - Column 9 = length of right flanking regions (bp)
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
57 - Column 10 = repeat motif (bp)
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
58 - Column 11 = hamming distance
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
59 - Column 12 = read name
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
60 - Column 13 = read sequence with soft masking of microsatellites
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
61 - Column 14 = read quality (the same Phred score scale as input)
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
62 - Column 15 = read name (The same as column 12)
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
63 - Column 16 = chromosome
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
64 - Column 17 = left flanking region start
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
65 - Column 18 = left flanking region stop
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
66 - Column 19 = microsatellite start as infer from pair-end
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
67 - Column 20 = microsatellite stop as infer from pair-end
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
68 - Column 21 = right flanking region start
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
69 - Column 22 = right flanking region stop
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
70 - Column 23 = microsatellite length in reference
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
71 - Column 24 = microsatellite sequence in reference
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
72
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
73 **Output**
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
74
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
75 The same as input format.
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
76
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
77
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
78 </help>
ecfc9041bcc5 Deleted selected files
arkarachai-fungtammasan
parents:
diff changeset
79 </tool>