Mercurial > repos > caleb-easterly > validate_fasta_database
comparison validateFASTA.xml @ 0:daf36a052a01 draft
planemo upload commit 486a143038c57c7a2368c66a55877cda12f694ed-dirty
author | caleb-easterly |
---|---|
date | Thu, 22 Jun 2017 18:27:45 -0400 |
parents | |
children | d61a95fe20e4 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:daf36a052a01 |
---|---|
1 <tool id="validateFASTA" name="Check FASTA Headers" version="0.1.0"> | |
2 <requirements> | |
3 </requirements> | |
4 <stdio> | |
5 <exit_code range="1" level="fatal" description="Invalid FASTA headers detected, was asked to fail"/> | |
6 </stdio> | |
7 <command detect_errors="exit_code"><![CDATA[ | |
8 java -jar $__tool_directory__/FastaHeader-1.0-SNAPSHOT-jar-with-dependencies.jar "$FASTA" "$goodFasta" "$badFasta" "$crashIfInvalid" | |
9 ]]></command> | |
10 <inputs> | |
11 <param type="data" name="FASTA" format="fasta" label="Select input FASTA dataset"/> | |
12 <param type="boolean" name="crashIfInvalid" label="Fail job if invalid FASTA headers detected?"/> | |
13 </inputs> | |
14 <outputs> | |
15 <data name="goodFasta" format="fasta" label="Validate FASTA: Passed Sequences"/> | |
16 <data name="badFasta" format="fasta" label="Validate FASTA: Failed Sequences"/> | |
17 </outputs> | |
18 <tests> | |
19 <test> | |
20 <param name="FASTA" value="fastaFilteringTest_IN.fasta"/> | |
21 <output name="goodFasta" file="fastaFilteringTest_OUT1.fasta" /> | |
22 <output name="badFasta" file="fastaFilteringTest_OUT2.fasta" /> | |
23 </test> | |
24 </tests> | |
25 <help> | |
26 <![CDATA[ | |
27 **Notes** | |
28 | |
29 Takes a FASTA database and validates the headers using the Compomics (developers of SearchGUI and PeptideShaker) schema. | |
30 Custom FASTA databases may be in an invalid format, which causes SearchGUI to crash. | |
31 | |
32 **Output** | |
33 | |
34 The main output of this tool, "Validate FASTA: Passed Sequences", is a FASTA database that can be run through SearchGUI without error. | |
35 The failed sequences may be examined for typos and other errors. | |
36 | |
37 In addition, the tool will print the databases assigned by the Compomics utility (i.e., UniProt), for a quick check of the validity of the custom FASTA database. | |
38 | |
39 Sequences that may cause the tool to report an exception are those that are not valid examples of the following formats: | |
40 * UniProt, | |
41 * SwissProt (starts with ">sw|" or ">SW|") | |
42 * NCBI (starts with ">gi|" or ">GI|") | |
43 * Halobacterium from Max Planck (starts with "OE") | |
44 * H Influenza, from Novartis (starts with ">hflu_") | |
45 * C Trachomatis (starts with ">C.tr\_" or "C\_trachomatis\_") | |
46 * M Tuberculosis (starts with ">M. tub") | |
47 * Saccharomyces Genome Database (contains "SGDID") | |
48 * Genome translation (ex. ">dm345\_3L-sense [2343534-234353938]") | |
49 * Genome Annotation Framework for Flexible Analysis (GAFFA) (starts with ">GAFFA") | |
50 * UPS (contains "\_HUMAN\_UPS") | |
51 | |
52 Many sequences are reported as Generic, which may or may not allow for extraction of the accession number. | |
53 ]]> | |
54 </help> | |
55 <citations> | |
56 <citation type="bibtex"> | |
57 @misc{fastaValidation, | |
58 author = {The GalaxyP Team}, | |
59 date = {22 June 2017}, | |
60 title = {FASTA Database Validation Tool} | |
61 } | |
62 </citation> | |
63 </citations> | |
64 </tool> |