Mercurial > repos > galaxyp > fasta_merge_files_and_filter_unique_sequences
diff fasta_merge_files_and_filter_unique_sequences.xml @ 2:379c41d859aa draft
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/fasta_merge_files_and_filter_unique_sequences commit 240d1baaa04767c7d6ad6e36c854c2b54093e92f
author | galaxyp |
---|---|
date | Wed, 01 Feb 2017 13:24:16 -0500 |
parents | 74144834b0bd |
children | 9ad0d336e5ed |
line wrap: on
line diff
--- a/fasta_merge_files_and_filter_unique_sequences.xml Fri Dec 16 05:19:27 2016 -0500 +++ b/fasta_merge_files_and_filter_unique_sequences.xml Wed Feb 01 13:24:16 2017 -0500 @@ -5,13 +5,17 @@ </requirements> <command> python '$__tool_directory__/fasta_merge_files_and_filter_unique_sequences.py' - '$output' + '$output' $uniqueness_criterion #for $input in $inputs: '$input' #end for </command> <inputs> <param name="inputs" format="fasta" multiple="True" type="data" label="Input FASTA files"/> + <param name="uniqueness_criterion" type="select" label="How are sequences judged to be unique?"> + <option value="sequence" selected="true">Accession and Sequence</option> + <option value="accession">Accession Only</option> + </param> </inputs> <outputs> <data format="fasta" name="output" label="Merged and Filtered FASTA from ${on_string}"/> @@ -19,7 +23,21 @@ <tests> <test> <param name="inputs" value="1.fa,2.fa" ftype="fasta" /> - <output name="output" file="res.fa" ftype="fasta" /> + <param name="uniqueness_criterion" value="sequence" /> + <output name="output" file="res-sequence.fa" ftype="fasta" /> + <assert_stdout> + <has_line line="Skipping protein '>one_2' with duplicate sequence (first seen as '>one')" /> + <has_line line="Skipping protein '>two_2' with duplicate sequence (first seen as '>two')" /> + <has_line line="Skipping protein '>three_2' with duplicate header" /> + </assert_stdout> + </test> + <test> + <param name="inputs" value="1.fa,2.fa" ftype="fasta" /> + <param name="uniqueness_criterion" value="accession" /> + <output name="output" file="res-accession.fa" ftype="fasta" /> + <assert_stdout> + <has_line line="Skipping protein '>three_2' with duplicate header" /> + </assert_stdout> </test> </tests> <help> @@ -27,7 +45,11 @@ **What it does** Concatenate FASTA database files together. -Only first appearence of each unique sequence will appear in output. + +If the uniqueness criterion is "Accession and Sequence", only the first appearence of each unique sequence will appear in the output. +Otherwise, duplicate sequences are allowed, but only the first appearance of each accession will appear in the output. + +In the context of this script, the accession is the entire header line. ------