comparison fastq_name_affixer.xml @ 22:58807b35777a draft

planemo upload commit 20bdf879b52796d3fb251a20807191ff02084d3c-dirty
author petr-novak
date Wed, 02 Aug 2023 11:31:12 +0000
parents c2c69c6090f0
children 36c418bca8b2
comparison
equal deleted inserted replaced
21:f4ed6a65a2ff 22:58807b35777a
1 <tool id="names_affixer" name="FASTQ Read name affixer" version="1.0.0"> 1 <tool id="names_affixer" name="FASTQ Read name affixer" version="1.0.0">
2 <description> Tool appending suffix and prefix to sequences names </description> 2 <description>Tool appending suffix and prefix to sequences names</description>
3 <command interpreter="python"> 3 <required_files>
4 ${__tool_directory__}/name_affixer.py -f $input -p "$prefix" -s "$suffix" -n $nspace > $output 4 <include type="literal" path="name_affixer.py"/>
5 </command> 5 </required_files>
6 <command>
7 ${__tool_directory__}/name_affixer.py -f $input -p "$prefix" -s "$suffix" -n
8 $nspace > $output
9 </command>
6 10
7 <inputs> 11 <inputs>
8 <param format="fastq" type="data" name="input" label="Choose your FASTQ file" /> 12 <param format="fastq" type="data" name="input" label="Choose your FASTQ file"/>
9 <param name="prefix" type="text" size="10" value="" label="Prefix" help="Enter prefix which will be added to all sequences names" /> 13 <param name="prefix" type="text" size="10" value="" label="Prefix"
10 <param name="suffix" type="text" size="10" value="" label="Suffix" help="Enter suffix which will be added to all sequences names"/> 14 help="Enter prefix which will be added to all sequences names"/>
11 <param name="nspace" type="integer" size="10" value="0" min="0" max="1000" label="Number of spaces in sequence name to ignore" help="Sequence name is a string before the first space. If you want name to include spaces in name, enter positive integer. All other characters beyond ignored spaces are omitted"/> 15 <param name="suffix" type="text" size="10" value="" label="Suffix"
12 </inputs> 16 help="Enter suffix which will be added to all sequences names"/>
17 <param name="nspace" type="integer" size="10" value="0" min="0" max="1000"
18 label="Number of spaces in sequence name to ignore"
19 help="Sequence name is a string before the first space. If you want name to include spaces in name, enter positive integer. All other characters beyond ignored spaces are omitted"/>
20 </inputs>
13 21
14 22
15 <outputs> 23 <outputs>
16 <data format="fastq" name="output" label="FASTQ dataset ${input.hid} with modified sequence names" /> 24 <data format="fastq" name="output"
17 </outputs> 25 label="FASTQ dataset ${input.hid} with modified sequence names"/>
26 </outputs>
18 27
19 <help> 28 <help>
20 **What is does** 29 **What is does**
21
22 Tool for appending prefix and suffix to sequences names in fastq formated sequences.
23 30
24 **Example** 31 Tool for appending prefix and suffix to sequences names in fastq formated
32 sequences.
25 33
26 The following Solexa-FASTQ file: 34 **Example**
27
28 ::
29
30 @CSHL_4_FC042GAMMII_2_1_517_596
31 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT
32 +CSHL_4_FC042GAMMII_2_1_517_596
33 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40
34
35 is renamed to:
36 35
37 :: 36 The following Solexa-FASTQ file:
38
39 @prefixCSHL_4_FC042GAMMII_2_1_517_596suffix
40 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT
41 +prefixCSHL_4_FC042GAMMII_2_1_517_596suffix
42 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40
43 37
44 different format: 38 ::
45
46 39
47 :: 40 @CSHL_4_FC042GAMMII_2_1_517_596
48 41 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT
49 @HISEQ1:92:c0190acxx:8:1101:1252:2230 2:N:0:CGATGT 42 +CSHL_4_FC042GAMMII_2_1_517_596
50 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA 43 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24
51 + 44 24 9 24 9 40 10 10 15 40
52 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9
53 45
54 is renamed to: 46 is renamed to:
55 47
56 :: 48 ::
57
58 @prefixHISEQ1:92:c0190acxx:8:1101:1252:2230suffix
59 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA
60 +
61 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9
62
63 note that string after first space is omitted!
64 49
65 Because sequence names sometimes containg spaces which delimit the actual name. By default, anything after spaces is 50 @prefixCSHL_4_FC042GAMMII_2_1_517_596suffix
66 excluded from sequences name. In example sequence: 51 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT
67 52 +prefixCSHL_4_FC042GAMMII_2_1_517_596suffix
68 :: 53 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24
69 54 24 9 24 9 40 10 10 15 40
70 @SRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1
71 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
72 +
73 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
74 55
75 when **Number of spaces in name to ignore** is set to 0 (default) the output will be: 56 different format:
76 57
77 :: 58
78 59 ::
79 @prefixSRR352150.23846180suffix 60
80 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC 61 @HISEQ1:92:c0190acxx:8:1101:1252:2230 2:N:0:CGATGT
81 + 62 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA
82 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG 63 +
83 64 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9
84 If you want to keep spaces the setting **Number of spaces in name to ignore** to 1 will yield 65
85 66 is renamed to:
86 :: 67
87 68 ::
88 @prefixSRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1suffix 69
89 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC 70 @prefixHISEQ1:92:c0190acxx:8:1101:1252:2230suffix
90 + 71 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA
91 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG 72 +
92 73 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9
93 74
94 </help> 75 note that string after first space is omitted!
76
77 Because sequence names sometimes containg spaces which delimit the actual name. By
78 default, anything after spaces is
79 excluded from sequences name. In example sequence:
80
81 ::
82
83 @SRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1
84 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
85 +
86 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
87
88 when **Number of spaces in name to ignore** is set to 0 (default) the output will
89 be:
90
91 ::
92
93 @prefixSRR352150.23846180suffix
94 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
95 +
96 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
97
98 If you want to keep spaces the setting **Number of spaces in name to ignore** to 1
99 will yield
100
101 ::
102
103 @prefixSRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1suffix
104 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
105 +
106 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
107
108
109 </help>
95 </tool> 110 </tool>