annotate peptide_indexer.xml @ 2:cf0d72c7b482 draft

Update.
author galaxyp
date Fri, 10 May 2013 17:31:05 -0400
parents
children 1183846e70a1
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
1 <tool id="openms_peptide_indexer" version="0.1.0" name="Peptide Indexer">
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
2 <description>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
3 Refreshes the protein references for all peptide hits from a idXML file.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
4 </description>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
5 <macros>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
6 <import>macros.xml</import>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
7 </macros>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
8 <expand macro="stdio" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
9 <expand macro="requires" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
10 <command interpreter="python">
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
11 openms_wrapper.py --executable 'PeptideIndexer' --config $config
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
12 </command>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
13 <configfiles>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
14 <configfile name="config">[simple_options]
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
15 in=$input1
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
16 fasta=$database
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
17 out=$output
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
18 decoy_string=$decoy_string
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
19 #if $decoy_string_position == "prefix"
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
20 prefix=true
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
21 #end if
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
22 $extact_search
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
23 $write_protein_sequence
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
24 $keep_unreferenced_proteins
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
25 aaa_max=$aaa_max
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
26 </configfile>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
27 </configfiles>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
28 <inputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
29 <param name="input1" label="Identification Input" type="data" format="idxml" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
30 <param name="database" label="Database" type="data" format="fasta" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
31 <param name="decoy_string" type="text" value="_rev" label="Decoy string"/>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
32 <param name="decoy_string_position" type="select" label="Decoy Position">
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
33 <option value="suffix" selected="true">Suffix</option>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
34 <option value="prefix">Prefix</option>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
35 </param>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
36 <param name="extact_search" label="Exact Search" type="boolean" truevalue="" falsevalue="full_tolerant_search=true" checked="true" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
37 <param name="write_protein_sequence" type="boolean" truevalue="write_protein_sequence=true" falsevalue="" checked="false" label="Store Protein Sequences" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
38 <param name="keep_unreferenced_proteins" label="Keep Unreferenced Proteins" truevalue="keep_unreferenced_proteins=true" falsevalue="" type="boolean" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
39 <param name="aaa_max" type="integer" value="4" label="Maximum Number of Ambiguous Amino Acids" help=" Maximal number of ambiguous amino acids (AAA) allowed when matching to a protein DB with AAA's. AAA's are 'B', 'Z', and 'X'" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
40 </inputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
41 <outputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
42 <data format="idxml" name="output" />
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
43 </outputs>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
44 <help>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
45 **What it does**
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
46
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
47 Each peptide hit is annotated by a target_decoy string, indicating if the peptide sequence is found in a 'target', a 'decoy' or in both 'target+decoy' protein. This information is crucial for the FalseDiscoveryRate IDPosteriorErrorProbability tools.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
48
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
49 Note:
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
50 Make sure that your protein names in the database contain a correctly formatted decoy string. This can be ensured by using DecoyDatabase. If the decoy identifier is not recognized successfully all proteins will be assumed to stem from the target-part of the query.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
51 E.g., "sw|P33354_REV|YEHR_ECOLI Uncharacterized lipop..." is invalid, since the tool has no knowledge of how SwissProt entries are build up. A correct identifier could be "rev_sw|P33354|YEHR_ECOLI Uncharacterized li ..." or "sw|P33354|YEHR_ECOLI_rev Uncharacterized li", depending on if you are using prefix annotation or not.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
52 This tool will also give you some target/decoy statistics when its done. Look carefully!
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
53
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
54 By default the tool will fail, if an unmatched peptide occurs, i.e. the database does not contain the corresponding protein. You can force the tool to return successfully in this case by using the flag 'allow_unmatched'.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
55
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
56 Some search engines (such as Mascot) will replace ambiguous AA's in the protein database with unambiguous AA' in the reported peptides, e.g., exchange 'X' with 'H'. This will cause this peptide not to be found by exactly matching its sequence. However, we can recover these cases by using tolerant search in these cases (done automatically). In all cases we require ambiguous AA's in peptide sequence to match exactly in the protein DB (i.e., 'X' in peptide only matches 'X').
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
57
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
58 Two search modes are available:
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
59
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
60 exact: Peptide sequences require exact match in protein database. If no protein for this peptide can be found, tolerant matching is automatically used for this peptide. Thus, the results for these peptides are identical for both search modes.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
61 tolerant: Allow ambiguous AA's in protein sequence, e.g., 'M' in peptide will match 'X' in protein. This mode might yield more protein hits for some peptides (even though they have exact matches as well).
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
62 The exact mode is much faster (about x10) and consumes less memory (about x2.5), but might fail to report a few proteins with ambiguous AAs for some peptides. Usually these proteins are putative, however.
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
63
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
64 **Citation**
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
65
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
66 For the underlying tool, please cite ``Marc Sturm, Andreas Bertsch, Clemens Gröpl, Andreas Hildebrandt, Rene Hussong, Eva Lange, Nico Pfeifer, Ole Schulz-Trieglaff, Alexandra Zerck, Knut Reinert, and Oliver Kohlbacher, 2008. OpenMS – an Open-Source Software Framework for Mass Spectrometry. BMC Bioinformatics 9: 163. doi:10.1186/1471-2105-9-163.``
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
67
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
68 If you use this tool in Galaxy, please cite Chilton J, et al. https://bitbucket.org/galaxyp/galaxyp-toolshed-openms
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
69 </help>
cf0d72c7b482 Update.
galaxyp
parents:
diff changeset
70 </tool>