Mercurial > repos > petr-novak > re_utils
diff fasta_affixer.xml @ 0:a4cd8608ef6b draft
Uploaded
author | petr-novak |
---|---|
date | Mon, 01 Apr 2019 07:56:36 -0400 |
parents | |
children | c2c69c6090f0 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fasta_affixer.xml Mon Apr 01 07:56:36 2019 -0400 @@ -0,0 +1,81 @@ +<tool id="fasta_affixer" name="FASTA read name affixer" version="1.0.0"> +<description> Tool appending suffix and prefix to sequences names </description> +<command interpreter="python3"> +fasta_affixer.py -f $input -p "$prefix" -s "$suffix" -n $nspace -o $output +</command> + + <inputs> + <param format="fasta" type="data" name="input" label="Choose your fasta file" /> + <param name="prefix" type="text" size="10" value="" label="Prefix" help="Enter prefix which will be added to all sequences names" /> + <param name="suffix" type="text" size="10" value="" label="Suffix" help="Enter suffix which will be added to all sequences names"/> + <param name="nspace" type="integer" size="10" value="0" min="0" max="1000" label="Number of spaces in name to ignore" help="Sequence name is a string before the first space. If you want name to include spaces in name, enter positive integer. All other characters beyond ignored spaces are omitted"/> + </inputs> + + + <outputs> + <data format="fasta" name="output" label="fasta dataset ${input.hid} with modified sequence names" /> + </outputs> + + <tests> + <test> + <param name="input" value="single_output.fasta" /> + <param name="prefix" value="TEST" /> + <param name="suffux" value="OK"/> + <param name="nspace" value="0" /> + <output name="output" value="prefix_suffix.fasta" /> + </test> + </tests> + <help> +**What is does** + +Tool for appending prefix and suffix to sequences names in fasta formated sequences. This tool is useful +if you want to do comparative analysis with RepeatExplorer and need to +append sample codes to sequence identifiers + +**Example** +The following fasta file: + +:: + + >123454 + acgtactgactagccatgacg + >234235 + acgtactgactagccatgacg + +is renamed to: + +:: + + >prefix123454suffix + acgtactgactagccatgacg + >prefix234235suffix + acgtactgactagccatgacg + + +By default, anything after spaces is +excluded from sequences name. In example sequence: + +:: + + >SRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1 + CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG + +when **Number of spaces in name to ignore** is set to 0 (default) the output will be: + +:: + + >prefixSRR352150.23846180suffix + CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC + + +If you want to keep spaces the setting **Number of spaces in name to ignore** to 1 will yield + +:: + + >prefixSRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1suffix + CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC + + +</help> +</tool>