comparison fastq_name_affixer.xml @ 3:e320ef2d105a draft

Uploaded
author petr-novak
date Thu, 05 Sep 2019 09:04:56 -0400
parents
children c2c69c6090f0
comparison
equal deleted inserted replaced
2:ff658cf87f16 3:e320ef2d105a
1 <tool id="names_affixer" name="FASTQ Read name affixer" version="1.0.0">
2 <description> Tool appending suffix and prefix to sequences names </description>
3 <command interpreter="python">
4 ${__tool_directory__}/name_affixer.py -f $input -p "$prefix" -s "$suffix" -n $nspace > $output
5 </command>
6
7 <inputs>
8 <param format="fastq" type="data" name="input" label="Choose your fastq file" />
9 <param name="prefix" type="text" size="10" value="" label="Prefix" help="Enter prefix which will be added to all sequences names" />
10 <param name="suffix" type="text" size="10" value="" label="Suffix" help="Enter suffix which will be added to all sequences names"/>
11 <param name="nspace" type="integer" size="10" value="0" min="0" max="1000" label="Number of spaces in name to ignore" help="Sequence name is a string before the first space. If you want name to include spaces in name, enter positive integer. All other characters beyond ignored spaces are omitted"/>
12 </inputs>
13
14
15 <outputs>
16 <data format="fastq" name="output" label="fastq dataset ${input.hid} with modified sequence names" />
17 </outputs>
18
19 <help>
20 **What is does**
21
22 Tool for appending prefix and suffix to sequences names in fastq formated sequences.
23
24 **Example**
25
26 The following Solexa-FASTQ file:
27
28 ::
29
30 @CSHL_4_FC042GAMMII_2_1_517_596
31 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT
32 +CSHL_4_FC042GAMMII_2_1_517_596
33 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40
34
35 is renamed to:
36
37 ::
38
39 @prefixCSHL_4_FC042GAMMII_2_1_517_596suffix
40 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT
41 +prefixCSHL_4_FC042GAMMII_2_1_517_596suffix
42 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40
43
44 different format:
45
46
47 ::
48
49 @HISEQ1:92:c0190acxx:8:1101:1252:2230 2:N:0:CGATGT
50 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA
51 +
52 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9
53
54 is renamed to:
55
56 ::
57
58 @prefixHISEQ1:92:c0190acxx:8:1101:1252:2230suffix
59 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA
60 +
61 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9
62
63 note that string after first space is omitted!
64
65 Because sequence names sometimes containg spaces which delimit the actual name. By default, anything after spaces is
66 excluded from sequences name. In example sequence:
67
68 ::
69
70 @SRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1
71 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
72 +
73 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
74
75 when **Number of spaces in name to ignore** is set to 0 (default) the output will be:
76
77 ::
78
79 @prefixSRR352150.23846180suffix
80 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
81 +
82 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
83
84 If you want to keep spaces the setting **Number of spaces in name to ignore** to 1 will yield
85
86 ::
87
88 @prefixSRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1suffix
89 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
90 +
91 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
92
93
94 </help>
95 </tool>