annotate peptide_indexer.xml @ 4:1183846e70a1 draft

Uploaded
author galaxyp
date Wed, 19 Jun 2013 13:15:44 -0400
parents cf0d72c7b482
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
1 <tool id="openms_peptide_indexer" version="0.1.0" name="Peptide Indexer">
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
2 <description>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
3 Refreshes the protein references for all peptide hits from a idXML file.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
4 </description>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
5 <macros>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
6 <import>macros.xml</import>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
7 </macros>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
8 <expand macro="stdio" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
9 <expand macro="requires" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
10 <command interpreter="python">
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
11 openms_wrapper.py --executable 'PeptideIndexer' --config $config
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
12 </command>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
13 <configfiles>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
14 <configfile name="config">[simple_options]
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
15 in=$input1
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
16 fasta=$database
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
17 out=$output
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
18 decoy_string=$decoy_string
4
1183846e70a1 Uploaded
galaxyp
parents: 2
diff changeset
19 prefix=$prefix
2
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
20 $extact_search
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
21 $write_protein_sequence
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
22 $keep_unreferenced_proteins
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
23 aaa_max=$aaa_max
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
24 </configfile>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
25 </configfiles>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
26 <inputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
27 <param name="input1" label="Identification Input" type="data" format="idxml" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
28 <param name="database" label="Database" type="data" format="fasta" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
29 <param name="decoy_string" type="text" value="_rev" label="Decoy string"/>
4
1183846e70a1 Uploaded
galaxyp
parents: 2
diff changeset
30 <param name="prefix" type="select" label="Decoy Position">
1183846e70a1 Uploaded
galaxyp
parents: 2
diff changeset
31 <option value="false" selected="true">Suffix</option>
1183846e70a1 Uploaded
galaxyp
parents: 2
diff changeset
32 <option value="true">Prefix</option>
2
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
33 </param>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
34 <param name="extact_search" label="Exact Search" type="boolean" truevalue="" falsevalue="full_tolerant_search=true" checked="true" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
35 <param name="write_protein_sequence" type="boolean" truevalue="write_protein_sequence=true" falsevalue="" checked="false" label="Store Protein Sequences" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
36 <param name="keep_unreferenced_proteins" label="Keep Unreferenced Proteins" truevalue="keep_unreferenced_proteins=true" falsevalue="" type="boolean" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
37 <param name="aaa_max" type="integer" value="4" label="Maximum Number of Ambiguous Amino Acids" help=" Maximal number of ambiguous amino acids (AAA) allowed when matching to a protein DB with AAA's. AAA's are 'B', 'Z', and 'X'" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
38 </inputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
39 <outputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
40 <data format="idxml" name="output" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
41 </outputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
42 <help>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
43 **What it does**
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
44
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
45 Each peptide hit is annotated by a target_decoy string, indicating if the peptide sequence is found in a 'target', a 'decoy' or in both 'target+decoy' protein. This information is crucial for the FalseDiscoveryRate IDPosteriorErrorProbability tools.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
46
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
47 Note:
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
48 Make sure that your protein names in the database contain a correctly formatted decoy string. This can be ensured by using DecoyDatabase. If the decoy identifier is not recognized successfully all proteins will be assumed to stem from the target-part of the query.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
49 E.g., "sw|P33354_REV|YEHR_ECOLI Uncharacterized lipop..." is invalid, since the tool has no knowledge of how SwissProt entries are build up. A correct identifier could be "rev_sw|P33354|YEHR_ECOLI Uncharacterized li ..." or "sw|P33354|YEHR_ECOLI_rev Uncharacterized li", depending on if you are using prefix annotation or not.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
50 This tool will also give you some target/decoy statistics when its done. Look carefully!
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
51
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
52 By default the tool will fail, if an unmatched peptide occurs, i.e. the database does not contain the corresponding protein. You can force the tool to return successfully in this case by using the flag 'allow_unmatched'.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
53
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
54 Some search engines (such as Mascot) will replace ambiguous AA's in the protein database with unambiguous AA' in the reported peptides, e.g., exchange 'X' with 'H'. This will cause this peptide not to be found by exactly matching its sequence. However, we can recover these cases by using tolerant search in these cases (done automatically). In all cases we require ambiguous AA's in peptide sequence to match exactly in the protein DB (i.e., 'X' in peptide only matches 'X').
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
55
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
56 Two search modes are available:
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
57
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
58 exact: Peptide sequences require exact match in protein database. If no protein for this peptide can be found, tolerant matching is automatically used for this peptide. Thus, the results for these peptides are identical for both search modes.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
59 tolerant: Allow ambiguous AA's in protein sequence, e.g., 'M' in peptide will match 'X' in protein. This mode might yield more protein hits for some peptides (even though they have exact matches as well).
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
60 The exact mode is much faster (about x10) and consumes less memory (about x2.5), but might fail to report a few proteins with ambiguous AAs for some peptides. Usually these proteins are putative, however.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
61
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
62 **Citation**
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
63
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
64 For the underlying tool, please cite ``Marc Sturm, Andreas Bertsch, Clemens Gröpl, Andreas Hildebrandt, Rene Hussong, Eva Lange, Nico Pfeifer, Ole Schulz-Trieglaff, Alexandra Zerck, Knut Reinert, and Oliver Kohlbacher, 2008. OpenMS – an Open-Source Software Framework for Mass Spectrometry. BMC Bioinformatics 9: 163. doi:10.1186/1471-2105-9-163.``
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
65
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
66 If you use this tool in Galaxy, please cite Chilton J, et al. https://bitbucket.org/galaxyp/galaxyp-toolshed-openms
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
67 </help>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
68 </tool>