Mercurial > repos > petr-novak > re_utils

diff fasta_affixer.xml @ 0:a4cd8608ef6b draft
Uploaded
author: petr-novak
date: Mon, 01 Apr 2019 07:56:36 -0400
children: c2c69c6090f0
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/fasta_affixer.xml	Mon Apr 01 07:56:36 2019 -0400
@@ -0,0 +1,81 @@
+<tool id="fasta_affixer" name="FASTA read name affixer" version="1.0.0">
+<description> Tool appending suffix and prefix to sequences names </description>
+<command interpreter="python3">
+fasta_affixer.py -f $input -p "$prefix" -s "$suffix" -n $nspace -o $output
+</command>
+
+ <inputs>
+  <param format="fasta" type="data" name="input" label="Choose your fasta file" />
+  <param name="prefix" type="text" size="10" value="" label="Prefix" help="Enter prefix which will be added to all sequences names" />
+  <param name="suffix" type="text" size="10" value="" label="Suffix" help="Enter suffix which will be added to all sequences names"/>
+  <param name="nspace" type="integer" size="10" value="0" min="0" max="1000" label="Number of spaces in name to ignore" help="Sequence name is a string before the first space. If you want name to include spaces in name, enter positive integer. All other characters beyond ignored spaces are omitted"/>
+ </inputs>
+
+
+ <outputs>
+ 	<data format="fasta" name="output" label="fasta dataset ${input.hid} with modified sequence names" />
+ </outputs>
+
+ <tests>
+   <test>
+     <param name="input" value="single_output.fasta" />
+     <param name="prefix" value="TEST" />
+     <param name="suffux" value="OK"/>
+     <param name="nspace" value="0" />
+     <output name="output" value="prefix_suffix.fasta" />
+   </test>
+ </tests>
+ <help>
+**What is does**
+ 
+Tool for appending prefix and suffix to sequences names in fasta formated sequences. This tool is useful
+if you want to do comparative analysis with RepeatExplorer and need to
+append sample codes to sequence identifiers
+
+**Example**
+The following fasta file:
+
+::
+
+  >123454
+  acgtactgactagccatgacg
+  >234235
+  acgtactgactagccatgacg
+
+is renamed to:
+
+::
+
+  >prefix123454suffix
+  acgtactgactagccatgacg
+  >prefix234235suffix
+  acgtactgactagccatgacg
+
+
+By default, anything after spaces is 
+excluded from sequences name. In example sequence:
+
+::
+  
+ >SRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1
+ CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
+ IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG
+
+when **Number of spaces in name to ignore** is set to 0 (default) the output will be:
+ 
+::
+ 
+ >prefixSRR352150.23846180suffix
+ CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
+
+ 
+If you want to keep spaces the setting **Number of spaces in name to ignore** to 1 will yield 
+ 
+:: 
+ 
+ >prefixSRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1suffix
+ CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC
+
+
+</help>
+</tool>
author	petr-novak
date	Mon, 01 Apr 2019 07:56:36 -0400
parents
children	c2c69c6090f0