Repository 'myrimatch'
hg clone https://toolshed.g2.bx.psu.edu/repos/galaxyp/myrimatch

Changeset 1:1881935e5351 (2017-02-01)
Previous changeset 0:23b316fad2b0 (2014-09-26) Next changeset 2:8240d0714a97 (2017-10-06)
Commit message:
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tools/bumbershoot/myrimatch commit 9eee537ea6d1e33b21596390767e7701f95a67d3
modified:
README.md
added:
macros.xml
myrimatch.xml
test-data/201208-378803-mm-15ppm-fully-tryptic.pepXML
test-data/201208-378803-mm.pepXML
test-data/input/201208-378803.mzML
test-data/input/cow.protein.PRG2012-subset.fasta
removed:
test-data/.gitkeep
tool-data/.gitkeep
tool-data/proteases.loc.sample
tools/myrimatch.xml
tools/repository_dependencies.xml
tools/tool_dependencies.xml
b
diff -r 23b316fad2b0 -r 1881935e5351 README.md
--- a/README.md Fri Sep 26 15:26:14 2014 -0400
+++ b/README.md Wed Feb 01 20:53:23 2017 -0500
[
@@ -1,7 +1,7 @@
 GalaxyP - MyriMatch
 ===================
 
-* Home: <https://bitbucket.org/galaxyp/myrimatch>
+* Home: <https://github.com/galaxyproteomics/tools-galaxyp/>
 * Galaxy Tool Shed: <http://toolshed.g2.bx.psu.edu/view/galaxyp/myrimatch>
 * Tool ID: `myrimatch`
 
@@ -13,15 +13,15 @@
 
 See:
 
-* <http://fenchurch.mc.vanderbilt.edu/bumbershoot/myrimatch/>
+* <http://fenchurch.mc.vanderbilt.edu/>
 
 
 GalaxyP Community
 -----------------
 
-Current governing community policies for [GalaxyP](https://bitbucket.org/galaxyp/) and other information can be found at:
+Current governing community policies for [GalaxyP](https://github.com/galaxyproteomics/) and other information can be found at:
 
-<https://bitbucket.org/galaxyp/galaxyp>
+<https://github.com/galaxyproteomics>
 
 
 License
@@ -39,7 +39,7 @@
 Contributing
 ------------
 
-Contributions to this repository are reviewed through pull requests. If you would like your work acknowledged, please also add yourself to the Authors section. If your pull request is accepted, you will also be acknowledged in <https://bitbucket.org/galaxyp/galaxyp/CONTRIBUTORS.md> unless you opt-out.
+Contributions to this repository are reviewed through pull requests. If you would like your work acknowledged, please also add yourself to the Authors section. If your pull request is accepted, you will also be acknowledged in <https://github.com/galaxyproteomics/tools-galaxyp/>
 
 
 Authors
@@ -47,5 +47,8 @@
 
 Authors and contributors:
 
+* Matt Chambers <matt.chambers42@gmail.com>
+  Vanderbilt University Medical Center
+
 * John Chilton <jmchilton@gmail.com>
-* Minnesota Supercomputing Institute, Univeristy of Minnesota
+  Minnesota Supercomputing Institute, Univeristy of Minnesota
b
diff -r 23b316fad2b0 -r 1881935e5351 macros.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/macros.xml Wed Feb 01 20:53:23 2017 -0500
b
@@ -0,0 +1,9 @@
+<macros>
+    <token name="@VERSION@">3.0.10246</token>
+    <xml name="requirements">
+        <requirements>
+            <requirement type="package" version="3_0_10246">bumbershoot</requirement>
+            <yield/>
+        </requirements>
+    </xml>
+</macros>
b
diff -r 23b316fad2b0 -r 1881935e5351 myrimatch.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/myrimatch.xml Wed Feb 01 20:53:23 2017 -0500
[
b'@@ -0,0 +1,239 @@\n+<tool id="myrimatch" version="@VERSION@.0" name="MyriMatch">\n+    <description>Identify peptides in tandem mass spectra.</description>\n+    <macros>\n+        <import>macros.xml</import>\n+    </macros>\n+    <expand macro="requirements" />\n+    <stdio>\n+        <exit_code range="1:" level="fatal" description="Job Failed" />\n+        <regex match="^Could not find the default configuration file.*$"\n+          source="both"\n+          level="warning" />\n+    </stdio>\n+    <command>\n+<![CDATA[\n+        #set $db_name = $ProteinDatabase.display_name.replace(".fasta", "") + ".fasta"\n+        #if $OutputFormat.value == "mzIdentML"\n+            #set $output_ext="mzid"\n+        #else\n+            #set $output_ext="pepXML"\n+        #end if\n+        #set $input_name = $input.display_name\n+        #set $output_name = $input_name.split(".")[0] + "." + $output_ext\n+\n+        #set $static_mods_str = ""\n+        #for $static_mod in $StaticMods\n+          #set $static_mods_str = $static_mods_str + " " + str($static_mod.aa) + " " + str($static_mod.mass)\n+        #end for\n+        #set $static_mods_str = $static_mods_str.lstrip()\n+\n+        #set $dynamic_mods_str = ""\n+        #for $dynamic_mod in $DynamicMods\n+            #set $dynamic_mods_str = $dynamic_mods_str + " " + str($dynamic_mod.motif) + " * " + str($dynamic_mod.mass)\n+        #end for\n+        #set $dynamic_mods_str = $dynamic_mods_str.lstrip()\n+\n+        ln -s \'$input\' \'${input_name}\' &&\n+        ln -s \'$ProteinDatabase\' \'${db_name}\' &&\n+        myrimatch -DecoyPrefix \'${DecoyPrefix}\' -cpus \\${GALAXY_SLOTS:-1}\n+        -ProteinDatabase \'${db_name}\'\n+        -OutputFormat \'${OutputFormat}\'\n+        \'${input_name}\'\n+        -StaticMods \'${static_mods_str}\'\n+        -DynamicMods \'${dynamic_mods_str}\'\n+        -MaxDynamicMods $MaxDynamicMods\n+        -CleavageRules \'${CleavageRules}\'\n+        -MinTerminiCleavages $MinTerminiCleavages\n+        -MaxMissedCleavages $MaxMissedCleavages\n+        #set $precursor_type = $tolerance_options.precursor_tolerance.PrecursorMzToleranceRule\n+        -PrecursorMzToleranceRule \'${tolerance_options.precursor_tolerance.PrecursorMzToleranceRule}\'\n+        #if $tolerance_options.precursor_tolerance.PrecursorMzToleranceRule == "auto" or $tolerance_options.precursor_tolerance.PrecursorMzToleranceRule == "mono"\n+          -MonoPrecursorMzTolerance \'${tolerance_options.precursor_tolerance.MonoPrecursorMzTolerance}${tolerance_options.precursor_tolerance.mono_precursor_mz_tolerance_units}\'\n+        #end if\n+        #if $tolerance_options.precursor_tolerance.PrecursorMzToleranceRule == "auto" or $tolerance_options.precursor_tolerance.PrecursorMzToleranceRule == "avg"\n+          -AvgPrecursorMzTolerance \'${tolerance_options.precursor_tolerance.AvgPrecursorMzTolerance}${tolerance_options.precursor_tolerance.avg_precursor_mz_tolerance_units}\'\n+        #end if\n+        -FragmentMzTolerance \'${tolerance_options.FragmentMzTolerance}${tolerance_options.fragment_mz_tolerance_units}\'\n+        -UseSmartPlusThreeModel $advanced.UseSmartPlusThreeModel\n+        -MinPeptideLength $advanced.MinPeptideLength\n+        -MaxPeptideLength $advanced.MaxPeptideLength\n+        -NumChargeStates $advanced.NumChargeStates\n+        -MaxResultRank $advanced.MaxResultRank\n+        #set $set = str($advanced.MonoisotopeAdjustmentSet).replace("[[", "[").replace("]]","]")\n+        -MonoisotopeAdjustmentSet \'$set\'\n+        #if $advanced.MaxPeakCount\n+          -MaxPeakCount $advanced.MaxPeakCount\n+        #end if\n+        #if $advanced.fragmentation_rule.FragmentationAutoRule\n+          -FragmentationAutoRule false -FragmentationRule \'manual:${advanced.fragmentation_rule.FragmentationRule}\'\n+        #end if\n+        &&\n+        mv \'$output_name\' output\n+]]>\n+    </command>\n+    <inputs>\n+          <param name="input" type="data" format="mzml,mzxml,mgf,ms2,mz5" label="Input Raw MS File(s)"/>\n+          <param argument="-OutputFormat" type="select" label="Output Type" help="The file'..b': [+2,NumChargeStates]." />\n+              <param argument="-MaxResultRank" type="integer" min="1" value="2" label="Maximum Result Rank" help="The maximum rank of a search result (results with the same score occupy the same rank)." />\n+              <param argument="-MonoisotopeAdjustmentSet" type="text" value="[-1,2]" label="Monoisotope Adjustment Set" help="For monoisotopic precursors where the monoisotope may have been incorrectly assigned to a nearby isotope, this range of adjustments will be considered. Instead of trying a wide precursor tolerance for a spectrum, this tries multiple tight tolerances.">\n+                  <sanitizer invalid_char=""><valid initial="string.digits"><add value="["/><add value="]"/><add value=","/><add value="-"/></valid><mapping initial="none"></mapping></sanitizer>\n+              </param>\n+          </section>\n+    </inputs>\n+    <outputs>\n+        <data format="raw_pepxml" name="output" from_work_dir="output">\n+            <change_format>\n+                <when input="OutputFormat" value="mzIdentML" format="mzid" />\n+            </change_format>\n+        </data>\n+    </outputs>\n+    <tests>\n+        <test>\n+            <param name="input" value="input/201208-378803.mzML" />\n+            <param name="ProteinDatabase" value="input/cow.protein.PRG2012-subset.fasta" />\n+            <param name="DecoyPrefix" value="XXX_" />\n+            <param name="StaticMods_0|aa" value="C" />\n+            <param name="StaticMods_0|mass" value="58.00548" />\n+            <param name="DynamicMods_0|motif" value="M" />\n+            <param name="DynamicMods_0|mass" value="15.9949" />\n+            <param name="DynamicMods_1|motif" value="(Q" />\n+            <param name="DynamicMods_1|mass" value="-17.026" />\n+            <param name="DynamicMods_2|motif" value="[QN]" />\n+            <param name="DynamicMods_2|mass" value="0.984016" />\n+            <param name="PrecursorMzToleranceRule" value="mono" />\n+            <param name="MonoPrecursorMzTolerance" value="50" />\n+            <param name="MinTerminiCleavages" value="1" />\n+            <param name="MaxMissedCleavages" value="2" />\n+            <param name="MaxDynamicMods" value="2" />\n+            <param name="NumChargeStates" value="5" />\n+            <param name="MaxResultRank" value="3" />\n+            <param name="MonoisotopeAdjustmentSet" value="[-1,2]" />\n+            <param name="UseSmartPlusThreeModel" value="false" />\n+            <output name="output" file="201208-378803-mm.pepXML" lines_diff="14" />\n+        </test>\n+        <test>\n+            <param name="input" value="input/201208-378803.mzML" />\n+            <param name="ProteinDatabase" value="input/cow.protein.PRG2012-subset.fasta" />\n+            <param name="CleavageRules" value="Trypsin" />\n+            <param name="DecoyPrefix" value="XXX_" />\n+            <param name="PrecursorMzToleranceRule" value="mono" />\n+            <param name="MonoPrecursorMzTolerance" value="15" />\n+            <param name="MinTerminiCleavages" value="2" />\n+            <param name="MaxMissedCleavages" value="4" />\n+            <param name="NumChargeStates" value="3" />\n+            <param name="MaxResultRank" value="1" />\n+            <param name="MonoisotopeAdjustmentSet" value="0" />\n+            <output name="output" file="201208-378803-mm-15ppm-fully-tryptic.pepXML" lines_diff="14" />\n+        </test>\n+    </tests>\n+    <help>\n+<![CDATA[\n+**What it does**\n+\n+Performs protein identification via database search using MyriMatch.\n+]]>\n+    </help>\n+    <citations>\n+        <citation type="doi">10.1021/pr0604054</citation>\n+        <citation type="bibtex">@misc{toolsGalaxyP, author = {Chilton, J, Chambers MC, et al.}, title = {Galaxy Proteomics Tools}, publisher = {GitHub}, journal = {GitHub repository},\n+                                      year = {2015}, url = {https://github.com/galaxyproteomics/tools-galaxyp}}</citation> <!-- TODO: fix substitution of commit ", commit = {$sha1$}" -->\n+    </citations>\n+</tool>\n'
b
diff -r 23b316fad2b0 -r 1881935e5351 test-data/201208-378803-mm-15ppm-fully-tryptic.pepXML
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/201208-378803-mm-15ppm-fully-tryptic.pepXML Wed Feb 01 20:53:23 2017 -0500
[
b'@@ -0,0 +1,499 @@\n+<?xml version="1.0" encoding="ISO-8859-1"?>\n+<msms_pipeline_analysis date="2015-09-03T17:55:14" summary_xml="201208-378803.pepXML" xmlns="http://regis-web.systemsbiology.net/pepXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://sashimi.sourceforge.net/schema_revision/pepXML/pepXML_v117.xsd">\n+  <analysis_summary analysis="MyriMatch" version="2.2.8634" time="2015-09-03T17:55:14"/>\n+  <msms_run_summary base_name="201208-378803" raw_data_type="" raw_data="">\n+    <sample_enzyme name="Trypsin" independent="false" fidelity="specific">\n+      <specificity sense="C" cut="KR" no_cut="P" min_spacing="1"/>\n+    </sample_enzyme>\n+    <search_summary base_name="201208-378803" search_engine="MyriMatch" precursor_mass_type="monoisotopic" fragment_mass_type="monoisotopic" out_data_type="" out_data="">\n+      <search_database local_path="cow.protein.PRG2012-subset.fasta" database_name="SDB" type="AA"/>\n+      <enzymatic_search_constraint enzyme="Trypsin" max_num_internal_cleavages="4" min_number_termini="2"/>\n+      <parameter name="Config: AvgPrecursorMzTolerance" value="1.5mz"/>\n+      <parameter name="Config: ClassSizeMultiplier" value="2"/>\n+      <parameter name="Config: CleavageRules" value="Trypsin"/>\n+      <parameter name="Config: ComputeXCorr" value="1"/>\n+      <parameter name="Config: DecoyPrefix" value="XXX_"/>\n+      <parameter name="Config: DynamicMods" value=""/>\n+      <parameter name="Config: EstimateSearchTimeOnly" value="0"/>\n+      <parameter name="Config: FragmentMzTolerance" value="0.5mz"/>\n+      <parameter name="Config: FragmentationAutoRule" value="1"/>\n+      <parameter name="Config: FragmentationRule" value="cid"/>\n+      <parameter name="Config: KeepUnadjustedPrecursorMz" value="0"/>\n+      <parameter name="Config: MaxDynamicMods" value="4"/>\n+      <parameter name="Config: MaxFragmentChargeState" value="0"/>\n+      <parameter name="Config: MaxMissedCleavages" value="4"/>\n+      <parameter name="Config: MaxPeakCount" value="300"/>\n+      <parameter name="Config: MaxPeptideLength" value="75"/>\n+      <parameter name="Config: MaxPeptideMass" value="10000"/>\n+      <parameter name="Config: MaxPeptideVariants" value="1000000"/>\n+      <parameter name="Config: MaxResultRank" value="1"/>\n+      <parameter name="Config: MinMatchedFragments" value="5"/>\n+      <parameter name="Config: MinPeptideLength" value="5"/>\n+      <parameter name="Config: MinPeptideMass" value="0"/>\n+      <parameter name="Config: MinResultScore" value="9.9999999999999995e-08"/>\n+      <parameter name="Config: MinTerminiCleavages" value="2"/>\n+      <parameter name="Config: MonoPrecursorMzTolerance" value="15ppm"/>\n+      <parameter name="Config: MonoisotopeAdjustmentSet" value="[0,0] "/>\n+      <parameter name="Config: NumBatches" value="50"/>\n+      <parameter name="Config: NumChargeStates" value="3"/>\n+      <parameter name="Config: NumIntensityClasses" value="3"/>\n+      <parameter name="Config: NumMzFidelityClasses" value="3"/>\n+      <parameter name="Config: OutputFormat" value="pepXML"/>\n+      <parameter name="Config: OutputSuffix" value=""/>\n+      <parameter name="Config: PrecursorMzToleranceRule" value="mono"/>\n+      <parameter name="Config: PreferIntenseComplements" value="1"/>\n+      <parameter name="Config: ProteinDatabase" value="cow.protein.PRG2012-subset.fasta"/>\n+      <parameter name="Config: ProteinListFilters" value=""/>\n+      <parameter name="Config: ProteinSamplingTime" value="15"/>\n+      <parameter name="Config: ResultsPerBatch" value="200000"/>\n+      <parameter name="Config: SpectrumListFilters" value="peakPicking true 2-"/>\n+      <parameter name="Config: StaticMods" value=""/>\n+      <parameter name="Config: StatusUpdateFrequency" value="5"/>\n+      <parameter name="Config: TicCutoffPercentage" value="0.97999999999999998"/>\n+      <parameter name="Config: UseMultipleProcessors" value="1"/>\n+      <parameter name="Config: UseSmartPlusThreeModel" value="1"/>\n+      '..b'5219930.1|"/>\n+          <alternative_protein protein="gi|528993975|ref|XP_005219931.1|"/>\n+          <search_score name="mvh" value="12.259750257107"/>\n+          <search_score name="mzFidelity" value="34.970325461964"/>\n+          <search_score name="xcorr" value="1.9202226581428345"/>\n+        </search_hit>\n+        <search_hit hit_rank="2" peptide="ISELFDKLDSMSVDK" peptide_prev_aa="R" peptide_next_aa="I" protein="XXX_gi|528968108|ref|XP_005212569.1|" num_tot_proteins="1" calc_neutral_pep_mass="1725.8495025234" massdiff="-0.008717801883" num_tol_term="2" num_missed_cleavages="1" num_matched_ions="11" tot_num_ions="30">\n+          <search_score name="mvh" value="11.323857836871"/>\n+          <search_score name="mzFidelity" value="36.756835575306"/>\n+          <search_score name="xcorr" value="0.93184037338923231"/>\n+        </search_hit>\n+      </search_result>\n+    </spectrum_query>\n+    <spectrum_query spectrum="201208-378803.30.30.3" spectrumNativeID="sample=1 period=1 cycle=1231 experiment=2" start_scan="30" end_scan="30" precursor_neutral_mass="1467.72458061501" assumed_charge="3" index="33" retention_time_sec="839.28400000002">\n+      <search_result num_target_comparisons="8" num_decoy_comparisons="2">\n+        <search_hit hit_rank="1" peptide="GVTEHSNQQQGRK" peptide_prev_aa="R" peptide_next_aa="E" protein="gi|156523214|ref|NP_001096021.1|" num_tot_proteins="3" calc_neutral_pep_mass="1467.7178519195" massdiff="-0.00672869551" num_tol_term="2" num_missed_cleavages="1" num_matched_ions="5" tot_num_ions="25">\n+          <alternative_protein protein="gi|528961840|ref|XP_005211303.1|"/>\n+          <alternative_protein protein="gi|528961845|ref|XP_005211304.1|"/>\n+          <search_score name="mvh" value="6.653798251586"/>\n+          <search_score name="mzFidelity" value="14.932769947598"/>\n+          <search_score name="xcorr" value="0.028206144867315191"/>\n+        </search_hit>\n+      </search_result>\n+    </spectrum_query>\n+    <spectrum_query spectrum="201208-378803.31.31.2" spectrumNativeID="sample=1 period=1 cycle=1234 experiment=2" start_scan="31" end_scan="31" precursor_neutral_mass="1030.473954852586" assumed_charge="2" index="34" retention_time_sec="842.98000000002">\n+      <search_result num_target_comparisons="0" num_decoy_comparisons="3">\n+        <search_hit hit_rank="1" peptide="TFAHQENGK" peptide_prev_aa="R" peptide_next_aa="R" protein="XXX_gi|528921178|ref|XP_592304.7|" num_tot_proteins="3" calc_neutral_pep_mass="1030.4832075229" massdiff="0.009252670314" num_tol_term="2" num_missed_cleavages="0" num_matched_ions="5" tot_num_ions="17">\n+          <alternative_protein protein="XXX_gi|528921180|ref|XP_005195747.1|"/>\n+          <alternative_protein protein="XXX_gi|528995523|ref|XP_002695913.2|"/>\n+          <search_score name="mvh" value="6.979786912131"/>\n+          <search_score name="mzFidelity" value="21.605021401289"/>\n+          <search_score name="xcorr" value="0.49320911207199097"/>\n+        </search_hit>\n+      </search_result>\n+    </spectrum_query>\n+    <spectrum_query spectrum="201208-378803.32.32.3" spectrumNativeID="sample=1 period=1 cycle=1283 experiment=2" start_scan="32" end_scan="32" precursor_neutral_mass="1770.954827198736" assumed_charge="3" index="35" retention_time_sec="883.49500000002">\n+      <search_result num_target_comparisons="1" num_decoy_comparisons="9">\n+        <search_hit hit_rank="1" peptide="TEPHSFQRLSLTEVK" peptide_prev_aa="K" peptide_next_aa="R" protein="XXX_gi|528944676|ref|XP_005204734.1|" num_tot_proteins="1" calc_neutral_pep_mass="1770.9264477115" massdiff="-0.028379487236" num_tol_term="2" num_missed_cleavages="1" num_matched_ions="5" tot_num_ions="30">\n+          <search_score name="mvh" value="6.109846868339"/>\n+          <search_score name="mzFidelity" value="13.900987096911"/>\n+          <search_score name="xcorr" value="0.73578462478668327"/>\n+        </search_hit>\n+      </search_result>\n+    </spectrum_query>\n+  </msms_run_summary>\n+</msms_pipeline_analysis>\n'
b
diff -r 23b316fad2b0 -r 1881935e5351 test-data/201208-378803-mm.pepXML
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/201208-378803-mm.pepXML Wed Feb 01 20:53:23 2017 -0500
[
b'@@ -0,0 +1,20531 @@\n+<?xml version="1.0" encoding="ISO-8859-1"?>\n+<msms_pipeline_analysis date="2015-09-03T15:49:50" summary_xml="201208-378803.pepXML" xmlns="http://regis-web.systemsbiology.net/pepXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://sashimi.sourceforge.net/schema_revision/pepXML/pepXML_v117.xsd">\n+  <analysis_summary analysis="MyriMatch" version="2.2.8634" time="2015-09-03T15:49:50"/>\n+  <msms_run_summary base_name="201208-378803" raw_data_type="" raw_data="">\n+    <sample_enzyme name="Trypsin/P" independent="false" fidelity="semispecific">\n+      <specificity sense="C" cut="KR" no_cut="" min_spacing="1"/>\n+    </sample_enzyme>\n+    <search_summary base_name="201208-378803" search_engine="MyriMatch" precursor_mass_type="monoisotopic" fragment_mass_type="monoisotopic" out_data_type="" out_data="">\n+      <search_database local_path="cow.protein.PRG2012-subset.fasta" database_name="SDB" type="AA"/>\n+      <enzymatic_search_constraint enzyme="Trypsin/P" max_num_internal_cleavages="2" min_number_termini="1"/>\n+      <aminoacid_modification aminoacid="M" massdiff="15.9949" mass="147.0353846062" variable="Y" description="Oxidation"/>\n+      <aminoacid_modification aminoacid="Q" massdiff="-17.026" mass="111.0325775114" peptide_terminus="n" variable="Y"/>\n+      <aminoacid_modification aminoacid="Q" massdiff="0.984016" mass="129.0425935114" variable="Y" description="Deamidated"/>\n+      <aminoacid_modification aminoacid="N" massdiff="0.984016" mass="115.0269434472" variable="Y" description="Deamidated"/>\n+      <aminoacid_modification aminoacid="C" massdiff="58.00548" mass="161.0146644778" variable="N" description="Carboxymethyl"/>\n+      <parameter name="Config: AvgPrecursorMzTolerance" value="1.5mz"/>\n+      <parameter name="Config: ClassSizeMultiplier" value="2"/>\n+      <parameter name="Config: CleavageRules" value="Trypsin/P"/>\n+      <parameter name="Config: ComputeXCorr" value="1"/>\n+      <parameter name="Config: DecoyPrefix" value="XXX_"/>\n+      <parameter name="Config: DynamicMods" value="M * 15.9949 (Q * -17.026 [QN] * 0.984016"/>\n+      <parameter name="Config: EstimateSearchTimeOnly" value="0"/>\n+      <parameter name="Config: FragmentMzTolerance" value="0.5mz"/>\n+      <parameter name="Config: FragmentationAutoRule" value="1"/>\n+      <parameter name="Config: FragmentationRule" value="cid"/>\n+      <parameter name="Config: KeepUnadjustedPrecursorMz" value="0"/>\n+      <parameter name="Config: MaxDynamicMods" value="2"/>\n+      <parameter name="Config: MaxFragmentChargeState" value="0"/>\n+      <parameter name="Config: MaxMissedCleavages" value="2"/>\n+      <parameter name="Config: MaxPeakCount" value="300"/>\n+      <parameter name="Config: MaxPeptideLength" value="75"/>\n+      <parameter name="Config: MaxPeptideMass" value="10000"/>\n+      <parameter name="Config: MaxPeptideVariants" value="1000000"/>\n+      <parameter name="Config: MaxResultRank" value="3"/>\n+      <parameter name="Config: MinMatchedFragments" value="5"/>\n+      <parameter name="Config: MinPeptideLength" value="5"/>\n+      <parameter name="Config: MinPeptideMass" value="0"/>\n+      <parameter name="Config: MinResultScore" value="9.9999999999999995e-08"/>\n+      <parameter name="Config: MinTerminiCleavages" value="1"/>\n+      <parameter name="Config: MonoPrecursorMzTolerance" value="50ppm"/>\n+      <parameter name="Config: MonoisotopeAdjustmentSet" value="[-1,2] "/>\n+      <parameter name="Config: NumBatches" value="50"/>\n+      <parameter name="Config: NumChargeStates" value="5"/>\n+      <parameter name="Config: NumIntensityClasses" value="3"/>\n+      <parameter name="Config: NumMzFidelityClasses" value="3"/>\n+      <parameter name="Config: OutputFormat" value="pepXML"/>\n+      <parameter name="Config: OutputSuffix" value=""/>\n+      <parameter name="Config: PrecursorMzToleranceRule" value="mono"/>\n+      <parameter name="Config: PreferIntenseComplements" value="1"/>\n+      <parameter name="Config: Pr'..b'="180">\n+          <modification_info>\n+            <mod_aminoacid_mass position="12" mass="147.0353846062"/>\n+            <mod_aminoacid_mass position="14" mass="161.0146644778"/>\n+            <mod_aminoacid_mass position="24" mass="129.0425935114"/>\n+          </modification_info>\n+          <search_score name="mvh" value="21.662729207058"/>\n+          <search_score name="mzFidelity" value="10.946707360696"/>\n+          <search_score name="xcorr" value="2.2375717592793039"/>\n+        </search_hit>\n+        <search_hit hit_rank="2" peptide="EMLIAHSQPAEMSCGKGESEKLSQIE" peptide_prev_aa="K" peptide_next_aa="N" protein="gi|528944676|ref|XP_005204734.1|" num_tot_proteins="1" calc_neutral_pep_mass="2905.3143452141" massdiff="0.03407514139" num_tol_term="1" num_missed_cleavages="2" num_matched_ions="21" tot_num_ions="180">\n+          <modification_info>\n+            <mod_aminoacid_mass position="12" mass="147.0353846062"/>\n+            <mod_aminoacid_mass position="14" mass="161.0146644778"/>\n+          </modification_info>\n+          <search_score name="mvh" value="18.555164806005"/>\n+          <search_score name="mzFidelity" value="9.229647340988"/>\n+          <search_score name="xcorr" value="1.9522110076248838"/>\n+        </search_hit>\n+        <search_hit hit_rank="3" peptide="LLLNTSKRIMDDVETSSLHLDESFK" peptide_prev_aa="T" peptide_next_aa="L" protein="XXX_gi|115497338|ref|NP_001069884.1|" num_tot_proteins="4" calc_neutral_pep_mass="2906.4695232566" massdiff="0.18058826829" num_tol_term="1" num_missed_cleavages="2" num_matched_ions="20" tot_num_ions="171">\n+          <alternative_protein protein="XXX_gi|528968104|ref|XP_005212567.1|"/>\n+          <alternative_protein protein="XXX_gi|528968106|ref|XP_005212568.1|"/>\n+          <alternative_protein protein="XXX_gi|528968108|ref|XP_005212569.1|"/>\n+          <modification_info>\n+            <mod_aminoacid_mass position="10" mass="147.0353846062"/>\n+          </modification_info>\n+          <search_score name="mvh" value="17.95496665471"/>\n+          <search_score name="mzFidelity" value="6.815403335045"/>\n+          <search_score name="xcorr" value="2.564456180282789"/>\n+        </search_hit>\n+        <search_hit hit_rank="4" peptide="QIFVKTLTGKTITLEVEPSDTIENVK" peptide_prev_aa="M" peptide_next_aa="A" protein="gi|115496708|ref|NP_001069831.1|" num_tot_proteins="4" calc_neutral_pep_mass="2904.5583369414" massdiff="0.28673178429" num_tol_term="2" num_missed_cleavages="2" num_matched_ions="20" tot_num_ions="179">\n+          <alternative_protein protein="gi|27807503|ref|NP_777203.1|"/>\n+          <alternative_protein protein="gi|528968221|ref|XP_005212615.1|"/>\n+          <alternative_protein protein="gi|528995063|ref|XP_005220352.1|"/>\n+          <modification_info>\n+            <mod_aminoacid_mass position="1" mass="129.0425935114"/>\n+            <mod_aminoacid_mass position="24" mass="115.0269434472"/>\n+          </modification_info>\n+          <search_score name="mvh" value="16.196711830419"/>\n+          <search_score name="mzFidelity" value="8.384634792566"/>\n+          <search_score name="xcorr" value="1.9797285930138711"/>\n+        </search_hit>\n+        <search_hit hit_rank="5" peptide="ELNGSLQEMQGESSGVSTVWDLLADIK" peptide_prev_aa="K" peptide_next_aa="R" protein="XXX_gi|528944678|ref|XP_005204735.1|" num_tot_proteins="1" calc_neutral_pep_mass="2907.369549893" massdiff="0.07194998909" num_tol_term="2" num_missed_cleavages="0" num_matched_ions="18" tot_num_ions="188">\n+          <modification_info>\n+            <mod_aminoacid_mass position="7" mass="129.0425935114"/>\n+            <mod_aminoacid_mass position="10" mass="129.0425935114"/>\n+          </modification_info>\n+          <search_score name="mvh" value="13.019234366281"/>\n+          <search_score name="mzFidelity" value="6.610442990057"/>\n+          <search_score name="xcorr" value="1.709385507643441"/>\n+        </search_hit>\n+      </search_result>\n+    </spectrum_query>\n+  </msms_run_summary>\n+</msms_pipeline_analysis>\n'
b
diff -r 23b316fad2b0 -r 1881935e5351 test-data/input/201208-378803.mzML
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/input/201208-378803.mzML Wed Feb 01 20:53:23 2017 -0500
b
b'@@ -0,0 +1,5375 @@\n+<?xml version="1.0" encoding="utf-8"?>\n+<indexedmzML xmlns="http://psi.hupo.org/ms/mzml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.2_idx.xsd">\n+  <mzML xmlns="http://psi.hupo.org/ms/mzml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd" id="201208-378803-ABRR-AUG-1" version="1.1.0">\n+    <cvList count="2">\n+      <cv id="MS" fullName="Proteomics Standards Initiative Mass Spectrometry Ontology" version="3.65.0" URI="http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo"/>\n+      <cv id="UO" fullName="Unit Ontology" version="12:10:2011" URI="http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/phenotype/unit.obo"/>\n+    </cvList>\n+    <fileDescription>\n+      <fileContent>\n+        <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum" value=""/>\n+        <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>\n+      </fileContent>\n+      <sourceFileList count="2">\n+        <sourceFile id="WIFF" name="201208-378803.wiff" location="file://.">\n+          <cvParam cvRef="MS" accession="MS:1000770" name="WIFF nativeID format" value=""/>\n+          <cvParam cvRef="MS" accession="MS:1000562" name="ABI WIFF format" value=""/>\n+          <cvParam cvRef="MS" accession="MS:1000569" name="SHA-1" value="794711d760f2db8a6a11fff2e277b47ce5576df3"/>\n+        </sourceFile>\n+        <sourceFile id="WIFFSCAN" name="201208-378803.wiff.scan" location="file://.">\n+          <cvParam cvRef="MS" accession="MS:1000770" name="WIFF nativeID format" value=""/>\n+          <cvParam cvRef="MS" accession="MS:1000562" name="ABI WIFF format" value=""/>\n+          <cvParam cvRef="MS" accession="MS:1000569" name="SHA-1" value="165a0af0b1763bbe371899814a9e1457151586b8"/>\n+        </sourceFile>\n+      </sourceFileList>\n+    </fileDescription>\n+    <softwareList count="2">\n+      <software id="Analyst" version="unknown">\n+        <cvParam cvRef="MS" accession="MS:1000551" name="Analyst" value=""/>\n+      </software>\n+      <software id="pwiz_Reader_ABI" version="3.0.6585">\n+        <cvParam cvRef="MS" accession="MS:1000615" name="ProteoWizard software" value=""/>\n+      </software>\n+    </softwareList>\n+    <instrumentConfigurationList count="1">\n+      <instrumentConfiguration id="IC1">\n+        <cvParam cvRef="MS" accession="MS:1000495" name="Applied Biosystems instrument model" value=""/>\n+        <softwareRef ref="Analyst"/>\n+      </instrumentConfiguration>\n+    </instrumentConfigurationList>\n+    <dataProcessingList count="1">\n+      <dataProcessing id="pwiz_Reader_ABI_conversion">\n+        <processingMethod order="0" softwareRef="pwiz_Reader_ABI">\n+          <cvParam cvRef="MS" accession="MS:1000544" name="Conversion to mzML" value=""/>\n+        </processingMethod>\n+        <processingMethod order="1" softwareRef="pwiz_Reader_ABI">\n+          <cvParam cvRef="MS" accession="MS:1000035" name="peak picking" value=""/>\n+        </processingMethod>\n+      </dataProcessing>\n+    </dataProcessingList>\n+    <run id="_x0032_01208-378803-ABRR-AUG-1" defaultInstrumentConfigurationRef="IC1" startTimeStamp="2012-08-08T14:40:01Z" defaultSourceFileRef="WIFF">\n+      <spectrumList count="108" defaultDataProcessingRef="pwiz_Reader_ABI_conversion">\n+        <spectrum index="0" id="sample=1 period=1 cycle=1181 experiment=2" defaultArrayLength="77" dataProcessingRef="pwiz_Reader_ABI_conversion">\n+          <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>\n+          <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>\n+          <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>\n+          <cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" value=""/>\n+          <cvParam cvRef="MS" accession="MS'..b'2">331255</offset>\n+      <offset idRef="sample=1 period=1 cycle=1215 experiment=3">336828</offset>\n+      <offset idRef="sample=1 period=1 cycle=1215 experiment=4">342736</offset>\n+      <offset idRef="sample=1 period=1 cycle=1216 experiment=2">347660</offset>\n+      <offset idRef="sample=1 period=1 cycle=1216 experiment=3">353081</offset>\n+      <offset idRef="sample=1 period=1 cycle=1216 experiment=4">358642</offset>\n+      <offset idRef="sample=1 period=1 cycle=1216 experiment=5">364615</offset>\n+      <offset idRef="sample=1 period=1 cycle=1217 experiment=2">371809</offset>\n+      <offset idRef="sample=1 period=1 cycle=1217 experiment=3">378218</offset>\n+      <offset idRef="sample=1 period=1 cycle=1217 experiment=4">383633</offset>\n+      <offset idRef="sample=1 period=1 cycle=1217 experiment=5">390049</offset>\n+      <offset idRef="sample=1 period=1 cycle=1218 experiment=2">395533</offset>\n+      <offset idRef="sample=1 period=1 cycle=1218 experiment=3">402094</offset>\n+      <offset idRef="sample=1 period=1 cycle=1218 experiment=4">407366</offset>\n+      <offset idRef="sample=1 period=1 cycle=1218 experiment=5">412645</offset>\n+      <offset idRef="sample=1 period=1 cycle=1219 experiment=2">417751</offset>\n+      <offset idRef="sample=1 period=1 cycle=1219 experiment=3">423165</offset>\n+      <offset idRef="sample=1 period=1 cycle=1219 experiment=4">428674</offset>\n+      <offset idRef="sample=1 period=1 cycle=1220 experiment=2">433753</offset>\n+      <offset idRef="sample=1 period=1 cycle=1221 experiment=2">438772</offset>\n+      <offset idRef="sample=1 period=1 cycle=1221 experiment=3">444113</offset>\n+      <offset idRef="sample=1 period=1 cycle=1222 experiment=2">449012</offset>\n+      <offset idRef="sample=1 period=1 cycle=1223 experiment=2">454745</offset>\n+      <offset idRef="sample=1 period=1 cycle=1223 experiment=3">460131</offset>\n+      <offset idRef="sample=1 period=1 cycle=1224 experiment=2">464728</offset>\n+      <offset idRef="sample=1 period=1 cycle=1225 experiment=2">470652</offset>\n+      <offset idRef="sample=1 period=1 cycle=1228 experiment=2">476088</offset>\n+      <offset idRef="sample=1 period=1 cycle=1228 experiment=3">481428</offset>\n+      <offset idRef="sample=1 period=1 cycle=1229 experiment=2">486412</offset>\n+      <offset idRef="sample=1 period=1 cycle=1229 experiment=3">491845</offset>\n+      <offset idRef="sample=1 period=1 cycle=1229 experiment=4">497057</offset>\n+      <offset idRef="sample=1 period=1 cycle=1229 experiment=5">501846</offset>\n+      <offset idRef="sample=1 period=1 cycle=1230 experiment=2">506593</offset>\n+      <offset idRef="sample=1 period=1 cycle=1230 experiment=3">511174</offset>\n+      <offset idRef="sample=1 period=1 cycle=1231 experiment=2">516081</offset>\n+      <offset idRef="sample=1 period=1 cycle=1234 experiment=2">521300</offset>\n+      <offset idRef="sample=1 period=1 cycle=1235 experiment=2">526428</offset>\n+      <offset idRef="sample=1 period=1 cycle=1236 experiment=2">532118</offset>\n+      <offset idRef="sample=1 period=1 cycle=1236 experiment=3">537438</offset>\n+      <offset idRef="sample=1 period=1 cycle=1238 experiment=2">542518</offset>\n+      <offset idRef="sample=1 period=1 cycle=1239 experiment=2">547274</offset>\n+      <offset idRef="sample=1 period=1 cycle=1281 experiment=2">551813</offset>\n+      <offset idRef="sample=1 period=1 cycle=1283 experiment=2">556796</offset>\n+      <offset idRef="sample=1 period=1 cycle=1428 experiment=2">562274</offset>\n+      <offset idRef="sample=1 period=1 cycle=1580 experiment=2">566840</offset>\n+      <offset idRef="sample=1 period=1 cycle=1583 experiment=2">571692</offset>\n+      <offset idRef="sample=1 period=1 cycle=1627 experiment=2">577069</offset>\n+    </index>\n+    <index name="chromatogram">\n+      <offset idRef="TIC">581634</offset>\n+    </index>\n+  </indexList>\n+  <indexListOffset>634114</indexListOffset>\n+  <fileChecksum>7fac4bf3be88419c71e9806717db788f44a10a68</fileChecksum>\n+</indexedmzML>\n'
b
diff -r 23b316fad2b0 -r 1881935e5351 test-data/input/cow.protein.PRG2012-subset.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/input/cow.protein.PRG2012-subset.fasta Wed Feb 01 20:53:23 2017 -0500
[
b'@@ -0,0 +1,144 @@\n+>gi|528903801|ref|XP_871686.4| PREDICTED: cationic trypsin isoformX1 [Bos taurus]\n+MKTFIFLALLGAAVAFPVDDDDKIVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN\n+>gi|528908802|ref|XP_005199340.1| PREDICTED: zinc finger protein 169 isoform X1 [Bos taurus]\n+MRRVFSRKSTHQTKNMAPGLLTTRDEALMAFRDVAVAFTQKEWKLLSPAQRTLYRDVMLENYSHMVSLGIAFPKPKLIIQLEQGDEPWREESECLLDLCAAEGRKEFQPCLSCPVTFSSPQILHHYMLCGHALQIFPGSSAESHFLLDAPSCLNEKAKDGEREGSGTVFGRLQLSGTSRAFFSSSQGQPVDQGGSSSGRIDQGMISDEADAVLTETNISESGAVICENYRLGFSRKSSLFSLQKHHVCPECGRNFCQKSDLVKHQRTHSGEKPFSCRECGRGFGRRSSLTVHQRKHSGEKPYVCRECGRHFRYTSSLTNHKRIHSGERPFVCQQCGRGFRQKIALILHQRTHLEEKPFVCPECGRGFCQKASLLQHRSSHSGERPFVCLECGRGFRQQSLLLSHQVTHSGEKPYVCAECGHSFRQKVTLIRHQRTHTGEKPYLCSECGRGFSQKVSLMGHQRTHTGEKPYVCSECGRGFGQKVTLIRHQRTHTGEKPFLCPECGRTFGFKSLLTRHKRIHSGEEADVYRVCEQRLGLKIQLTSDQRTHSGEKPCVCDECGRGFGFKSALIRHQRTHSGEKPYVCRDCGRGFSQKSHLHRHRKTKSGHHLLPQELFS\n+>gi|528908804|ref|XP_005199341.1| PREDICTED: zinc finger protein 169 isoform X2 [Bos taurus]\n+MRRVFSRKSTHQTKNMAPGLLTTRDEALMAFRDVAVAFTQKEWKLLSPAQRTLYRDVMLENYSHMVSLGIAFPKPKLIIQLEQGDEPWREESECLLDLCAAEGRKEFQPCLSCPVTFSSPQILHHYMLCGHALQIFPGSSAESHFLLDAPSCLNEKAKDGEREGSGTVFGRLQLSGTSRAFFSSSQGQPVDQGGSSSGRIDQGMISDEADAVLTETNISESGAVICENYRLGFSRKSSLFSLQKHHVCPECGRNFCQKSDLVKHQRTHSGEKPFSCRECGRGFGRRSSLTVHQRKHSGEKPYVCRECGRHFRYTSSLTNHKRIHSGERPFVCQQCGRGFRQKIALILHQRTHLEEKPFVCPECGRGFCQKASLLQHRSSHSGERPFVCLECGRGFRQQSLLLSHQVTHSGEKPYVCAECGHSFRQKVTLIRHQRTHTGEKPYLCSECGRGFSQKVSLMGHQRTHTGEKPYVCSECGRGFGQKVTLIRHQRTHTGEKPFLCPECGRTFGFKSLLTRHKRIHSGEEADVYRVCEQRLGLKIQLTSDQRTHSGEKPCVCDECGRGFGFKSALIRHQRTHSGEKPYVCRDCGRGFSQKSHLHRHRKTKSGHHLLPQELFS\n+>gi|528908806|ref|XP_005199342.1| PREDICTED: zinc finger protein 169 isoform X3 [Bos taurus]\n+MRRVFSRKSTHQTKNMAPGLLTTRDEALMAFRDVAVAFTQKEWKLLSPAQRTLYRDVMLENYSHMVSLGIAFPKPKLIIQLEQGDEPWREESECLLDLCAAEGRKEFQPCLSCPVTFSSPQILHHYMLCGHALQIFPGSSAESHFLLDAPSCLNEKAKDGEREGSGTVFGRLQLSGTSRAFFSSSQGQPVDQGGSSSGRIDQGMISDEADAVLTETNISESGAVICENYRLGFSRKSSLFSLQKHHVCPECGRNFCQKSDLVKHQRTHSGEKPFSCRECGRGFGRRSSLTVHQRKHSGEKPYVCRECGRHFRYTSSLTNHKRIHSGERPFVCQQCGRGFRQKIALILHQRTHLEEKPFVCPECGRGFCQKASLLQHRSSHSGERPFVCLECGRGFRQQSLLLSHQVTHSGEKPYVCAECGHSFRQKVTLIRHQRTHTGEKPYLCSECGRGFSQKVSLMGHQRTHTGEKPYVCSECGRGFGQKVTLIRHQRTHTGEKPFLCPECGRTFGFKSLLTRHKRIHSGEEADVYRVCEQRLGLKIQLTSDQRTHSGEKPCVCDECGRGFGFKSALIRHQRTHSGEKPYVCRDCGRGFSQKSHLHRHRKTKSGHHLLPQELFS\n+>gi|528908808|ref|XP_005199343.1| PREDICTED: zinc finger protein 169 isoform X4 [Bos taurus]\n+MRRVFSRKSTHQTKNMAPGLLTTRDEALMAFRDVAVAFTQKEWKLLSPAQRTLYRDVMLENYSHMVSLGIAFPKPKLIIQLEQGDEPWREESECLLDLCAEGRKEFQPCLSCPVTFSSPQILHHYMLCGHALQIFPGSSAESHFLLDAPSCLNEKAKDGEREGSGTVFGRLQLSGTSRAFFSSSQGQPVDQGGSSSGRIDQGMISDEADAVLTETNISESGAVICENYRLGFSRKSSLFSLQKHHVCPECGRNFCQKSDLVKHQRTHSGEKPFSCRECGRGFGRRSSLTVHQRKHSGEKPYVCRECGRHFRYTSSLTNHKRIHSGERPFVCQQCGRGFRQKIALILHQRTHLEEKPFVCPECGRGFCQKASLLQHRSSHSGERPFVCLECGRGFRQQSLLLSHQVTHSGEKPYVCAECGHSFRQKVTLIRHQRTHTGEKPYLCSECGRGFSQKVSLMGHQRTHTGEKPYVCSECGRGFGQKVTLIRHQRTHTGEKPFLCPECGRTFGFKSLLTRHKRIHSGEEADVYRVCEQRLGLKIQLTSDQRTHSGEKPCVCDECGRGFGFKSALIRHQRTHSGEKPYVCRDCGRGFSQKSHLHRHRKTKSGHHLLPQELFS\n+>gi|528908810|ref|XP_005199344.1| PREDICTED: zinc finger protein 169 isoform X5 [Bos taurus]\n+MSVLDHALMAFRDVAVAFTQKEWKLLSPAQRTLYRDVMLENYSHMVSLGIAFPKPKLIIQLEQGDEPWREESECLLDLCAAEGRKEFQPCLSCPVTFSSPQILHHYMLCGHALQIFPGSSAESHFLLDAPSCLNEKAKDGEREGSGTVFGRLQLSGTSRAFFSSSQGQPVDQGGSSSGRIDQGMISDEADAVLTETNISESGAVICENYRLGFSRKSSLFSLQKHHVCPECGRNFCQKSDLVKHQRTHSGEKPFSCRECGRGFGRRSSLTVHQRKHSGEKPYVCRECGRHFRYTSSLTNHKRIHSGERPFVCQQCGRGFRQKIALILHQRTHLEEKPFVCPECGRGFCQKASLLQHRSSHSGERPFVCLECGRGFRQQSLLLSHQVTHSGEKPYVCAECGHSFRQKVTLIRHQRTHTGEKPYLCSECGRGFSQKVSLMGHQRTHTGEKPYVCSECGRGFGQKVTLIRHQRTHTGEKPFLCPECGRTFGFKSLLTRHKRIHSGEEADVYRVCEQRLGLKIQLTSDQRTHSGEKPCVCDECGRGFGFKSALIRHQRTHSGEKPYVCRDCGRGFSQKSHLHRHRKTKSGHHLLPQELFS\n+>gi|528908812|ref|XP_609847.5| PREDICTED: zinc finger protein 169 isoform X6 [Bos taurus]\n+MAFRDVAVAFTQKEWKLLS'..b'ily 12 member 1 isoform X3 [Bos taurus]\n+MSLNNSSNVFLDSTPSNTNRFQVNVINESHESSAAMNDNADPPHYEETSFGDEGQNRFRISFRPGNQECYDNFLQTGETAKTDASFHAYDSHTNTYYLQTFGHNTVDAVPKIEYYRNTGSVSGPKVNRPSLLDIHEQLAKNVSVAPGSADVVANGEGTPGDEQAENKGEDQAGAVKFGWVKGVLVRCMLNIWGVMLFIRLSWIVGEAGIGLGVIIIGLSVVVTTLTGISMSAICTNGVVRGGGAYYLISRSLGPEFGGSIGLIFAFANAVAVAMYVVGFAETVVDLLKETDSMMVDPTNDIRIIGSITVVILLGISVAGMEWEAKAQVILLIILLIAIANFFIGTVIPSNNEKRARGFFNYQASIFAENFGPSFTKGEGFFSVFAIFFPAATGILAGANISGDLEDPQDAIPKGTMLAIFITTVAYLGVAICVGACVVRDATGSVNDTIISGMNCNGSAACGLGYDFSRCRHEPCQYGLMNNFQVMSMVSGFGPLITAGIFSATLSSALASLVSAPKVFQALCKDNIYKALQFFAKGYGKNNEPLRGYFLTFVIAMAFILIAELNTIAPIISNFFLASYALINFSCFHASYAKSPGWRPAYGIYNMWVSLFGAVLCCAVMFVINWWAAVITYVIEFFLYIYVTYKKPDVNWGSSTQALSYMSALDNALELTTVEDHVKNFRPQCIVLTGGPMTRPALLDITHAFTKNSGLCICCEVFVGPRKLCVKEMNSGMAKKQAWLIKNKIKAFYAAVAADCFRDGVRSLLQASGLGRMKPNTLVIGYKKNWRKAPLTEIENYVGIIHDAFDFEIGVVIVRISQGFDISQVLQVREELEKLEQERLALEATIKDNESEEGNGGIRGLFKKAGKLNITKPTPKKDSSINTIQSMHVGEFNQKLVEASTQFKKKQGKGTIDVWWLFDDGGLILLIPYILTLRKKWKDCKLRIYVGGKINRIEEEKIAMASLLSKFRIKFADIHVIGDINVKPNKESWKVFEEMIEPYCLHESCKDLTTAEKLKRETPWKITDAELEAVKEKSYRQVRLNELLQEHSRAANLIVLSLPVARKGSISDWLYMAWLEILTKNLPPVLLVRGNHKNVLTFYS\n+>gi|297479727|ref|XP_002690982.1| PREDICTED: solute carrier family 12 member 1 isoform X1 [Bos taurus]\n+MSLNNSSNVFLDSTPSNTNRFQVNVINESHESSAAMNDNADPPHYEETSFGDEGQNRFRISFRPGNQECYDNFLQTGETAKTDASFHAYDSHTNTYYLQTFGHNTVDAVPKIEYYRNTGSVSGPKVNRPSLLDIHEQLAKNVSVAPGSADVVANGEGTPGDEQAENKGEDQAGAVKFGWVKGVLVRCMLNIWGVMLFIRLSWIVGEAGIGLGVLIILLSTMVTSITGLSTSAIATNGFVRGGGAYYLISRSLGPEFGGSIGLIFAFANAVAVAMYVVGFAETVVDLLKETDSMMVDPTNDIRIIGSITVVILLGISVAGMEWEAKAQVILLIILLIAIANFFIGTVIPSNNEKRARGFFNYQASIFAENFGPSFTKGEGFFSVFAIFFPAATGILAGANISGDLEDPQDAIPKGTMLAIFITTVAYLGVAICVGACVVRDATGSVNDTIISGMNCNGSAACGLGYDFSRCRHEPCQYGLMNNFQVMSMVSGFGPLITAGIFSATLSSALASLVSAPKVFQALCKDNIYKALQFFAKGYGKNNEPLRGYFLTFVIAMAFILIAELNTIAPIISNFFLASYALINFSCFHASYAKSPGWRPAYGIYNMWVSLFGAVLCCAVMFVINWWAAVITYVIEFFLYIYVTYKKPDVNWGSSTQALSYMSALDNALELTTVEDHVKNFRPQCIVLTGGPMTRPALLDITHAFTKNSGLCICCEVFVGPRKLCVKEMNSGMAKKQAWLIKNKIKAFYAAVAADCFRDGVRSLLQASGLGRMKPNTLVIGYKKNWRKAPLTEIENYVGIIHDAFDFEIGVVIVRISQGFDISQVLQVREELEKLEQERLALEATIKDNESEEGNGGIRGLFKKAGKLNITKPTPKKDSSINTIQSMHVGEFNQKLVEASTQFKKKQGKGTIDVWWLFDDGGLILLIPYILTLRKKWKDCKLRIYVGGKINRIEEEKIAMASLLSKFRIKFADIHVIGDINVKPNKESWKVFEEMIEPYCLHESCKDLTTAEKLKRETPWKITDAELEAVKEKSYRQVRLNELLQEHSRAANLIVLSLPVARKGSISDWLYMAWLEILTKNLPPVLLVRGNHKNVLTFYS\n+>gi|529012038|ref|XP_005226889.1| PREDICTED: pregnancy-associated glycoprotein 2 isoform X1 [Bos taurus]\n+MKWLVLLGLVALSECIVILPLKKMKTLRETLREKNLLNNFLEEQAYRLSKNDSKITIHPLRNYLDTAYVGNITIGTPPQEFRVVFDTGSANLWVPCITCTSPACYTHKTFNPQNSSSFREVGSPITIFYGSGIIQGFLGSDTVRIGNLVSPEQSFGLSLEEYGFDSLPFDGILGLAFPAMGIEDTIPIFDNLWSHGAFSEPVFAFYLNTISMNGTVTACSCGCEALLDTGTSMIYGPTKLVTNIHKLMNARLENSEYVVSCDAVKTLPPVIFNINGIDYPLRPQAYIIKIQNSCRSVFQGGTENSSLNTWILGDIFLRQYFSVFDRKNRRIGLAPAV\n+>gi|156523214|ref|NP_001096021.1| protein KHNYN [Bos taurus]\n+MPTWGAGSPSPDRFAVSAEAEDKVREQLPRVERIFRVGMSVLPKDCPENPHIWLQLEGPKENASRAKEYLKGLCSPELQNEIHYPPKLHCIFLGAQGFFLDCLTWSTSAHLVPGVPGSLMVSGLTEAFVMVQSRVEELVERLSWDFRLGPSPGASQCAGVLREFSALLQARGDAHTEALLQLPQAVQEELLSLVQEASRGQGPQAFPSWGWGGPGPLGAQQQGVRTPLGDGGVSLDTGPTGWQESRGERHAVEKEGTKQGGAREMDLGWKEWPGEEAWERQVAFRPQSGGGEASGGGEAGQAGPPKGKALGKEGVPQERGRLGVQGQPPSTQGPYQRASQLRGASLLQRLHNGEASPPRVPSPPPAPEPPWHCGDRGDRGDRADKQLVVARGRGSPWKRGTRGGNLVTGTQRFQEALQDPFTLCLANVPGKPDLRHIVIDGSNVAMVHGLQHYFSSRGIAIAVQYFWDRGHRDITVFVPQWRFSKDSKVREGHFLHKLYSLSLLSLTPSRVLDGKRISSYDDRFMVKLAEETDGIIVSNDQFRDLAEESEKWMAIIRERLLPFTFVGNLFMVPDDPLGRNGPTLDEFLKKPVRAPGSSKPQQSARGVTEHSNQQQGRKEEEKGNGGIRKTRETERLRRQLLEVFWGQDHKVDFILQREPYCRDINQLSEALLSLNF\n+>gi|28849951|ref|NP_788787.1| pregnancy-associated glycoprotein 2 precursor [Bos taurus]\n+MKWLVLLGLVALSECIVILPLKKMKTLRETLREKNLLNNFLEEQAYRLSKNDSKITIHPLRNYLDTAYVGNITIGTPPQEFRVVFDTGSANLWVPCITCTSPACYTHKTFNPQNSSSFREVGSPITIFYGSGIIQGFLGSDTVRIGNLVSPEQSFGLSLEEYGFDSLPFDGILGLAFPAMGIEDTIPIFDNLWSHGAFSEPVFAFYLNTNKPEGSVVMFGGVDHRYYKGELNWIPVSQTSHWQISMNNISMNGTVTACSCGCEALLDTGTSMIYGPTKLVTNIHKLMNARLENSEYVVSCDAVKTLPPVIFNINGIDYPLRPQAYIIKIQNSCRSVFQGGTENSSLNTWILGDIFLRQYFSVFDRKNRRIGLAPAV\n'
b
diff -r 23b316fad2b0 -r 1881935e5351 tool-data/proteases.loc.sample
--- a/tool-data/proteases.loc.sample Fri Sep 26 15:26:14 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,22 +0,0 @@
-Trypsin Trypsin
-Trypsin/P Trypsin/P
-2-iodobenzoate 2-iodobenzoate
-Arg-C Arg-C
-Asp-N Asp-N
-Asp-N_ambic Asp-N_ambic
-CNBr CNBr
-Chymotrypsin Chymotrypsin
-Formic_acid Formic_acid
-Lys-C Lys-C
-Lys-C/P Lys-C/P
-NoEnzyme NoEnzyme
-PepsinA PepsinA
-TrypChymo TrypChymo
-V8-DE V8-DE
-V8-E V8-E
-glutamyl endopeptidase glutamyl endopeptidase
-leukocyte elastase leukocyte elastase
-no cleavage no cleavage
-proline endopeptidase proline endopeptidase
-unspecific cleavage unspecific cleavage
-
b
diff -r 23b316fad2b0 -r 1881935e5351 tools/myrimatch.xml
--- a/tools/myrimatch.xml Fri Sep 26 15:26:14 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
b'@@ -1,192 +0,0 @@\n-<tool id="myrimatch" version="0.1.0" name="MyriMatch">\n-  <requirements>\n-    <requirement type="package" version="333">binaries_for_package_myrimatch</requirement>\n-  </requirements>\n-\n-  <description></description>\n-  <command>\n-  #set $mod_rep_chars = "*$^@.%!"\n-  #set $db_name = $input_database.display_name.replace(".fasta", "") + ".fasta"\n-  #if $output_type.value == "mzid"\n-  #set $output_ext="mzid"\n-  #set $output_format="mzIdentML"\n-  #else\n-  #set $output_ext="pepXML"\n-  #set $output_format="pepXML"\n-  #end if\n-  #set $input_name = $input.display_name\n-  #set $output_name = $input_name.split(".")[0] + "." + $output_ext\n-  #set $static_mods_str = ""\n-  #for $static_mod in $static_mods\n-  #set $static_mods_str = $static_mods_str + "" + str($static_mod.aa) + " " + str($static_mod.mass)\n-  #end for\n-  #set $dynamic_mods_str = ""\n-  #set $dynamic_mod_index = 0\n-  #for $dynamic_mod in $dynamic_mods\n-  #set $dynamic_mods_str = $dynamic_mods_str + "" + str($dynamic_mod.motif) + " " + $mod_rep_chars[$dynamic_mod_index] + " " + str($dynamic_mod.mass)\n-  #set $dynamic_mod_index = $dynamic_mod_index + 1\n-  #end for\n-  ln -s \'$input\' \'${input_name}\';\n-  ln -s \'$input_database\' \'${db_name}\';\n-  myrimatch -DecoyPrefix \'${decoy_prefix}\' \\\n-  -ProteinDatabase \'${db_name}\' \\\n-  -OutputFormat \'${$output_format}\' \\\n-  \'${input_name}\' \\\n-  -StaticMods \'${static_mods_str}\' \\\n-  -DynamicMods \'${dynamic_mods_str}\' \\\n-  -CleavageRules \'${protease}\' \\\n-  #set $percursor_type = $percursor_tolerance.percursor_type\n-  -PrecursorMzToleranceRule \'${percursor_type}\' \\\n-  #if $percursor_type == "auto" or $percursor_type == "mono"\n-  -MonoPrecursorMzTolerance \'${percursor_tolerance.mono_precursor_mz_tolerance}${percursor_tolerance.mono_precursor_mz_tolerance_units}\' \\\n-  #end if\n-  #if $percursor_type == "auto" or $percursor_type == "avg"\n-  -AvgPrecursorMzTolerance \'${percursor_tolerance.avg_precursor_mz_tolerance}${percursor_tolerance.avg_precursor_mz_tolerance_units}\' \\\n-  #end if\n-  -FragmentMzTolerance \'${fragment_mz_tolerance}${fragment_mz_tolerance_units}\' \\\n-  #if $advanced.use_advanced\n-  -UseSmartPlusThreeModel $advanced.use_smart_plus_three_model \\\n-  -MinPeptideLength $advanced.min_peptide_length \\\n-  -MaxPeptideLength $advanced.max_peptide_length \\\n-  #if $advanced.max_peak_count\n-  -MaxPeakCount $advanced.max_peak_count \\\n-  #end if\n-  #if $advanced.fragmentation_rule.override\n-  -FragmentationAutoRule false -FragmentationRule \'manual:${advanced.fragmentation_rule.fragmentation_rule}\' \\\n-  #end if\n-  #end if\n-  ;\n-  mv \'$output_name\' output\n-  </command>\n-  <stdio>\n-    <exit_code range="1:" level="fatal" description="Job Failed" />\n-    <regex match="^Could not find the default configuration file.*$"\n-      source="both"\n-      level="warning" />\n-  </stdio>\n-  <inputs>\n-    <conditional name="type">\n-      <param name="input_type" type="select" label="Input Type">\n-        <option value="mzml">mzML</option>\n-        <option value="mzxml">mzXML</option>\n-        <option value="mgf">mgf</option>\n-        <option value="ms2">ms2</option>\n-      </param>\n-      <when value="mzml">\n-        <param format="mzml" name="input" type="data" label="Input mzML"/>\n-      </when>\n-      <when value="mzxml">\n-        <param format="mzxml" name="input" type="data" label="Input mzXML"/>\n-      </when>\n-      <when value="mgf">\n-        <param format="mgf" name="input" type="data" label="Input mgf"/>\n-      </when>\n-      <when value="ms2">\n-        <param format="ms2" name="input" type="data" label="Input ms2"/>\n-      </when>\n-    </conditional>\n-    <param name="output_type" type="select" label="Output Type">\n-      <option value="raw_pepxml">pepXML</option>\n-      <option value="mzid">mzIdentML</option>\n-    </param>\n-    <param format="fasta" name="input_database" type="data" label="Protein Database"/>\n-    <param name="decoy_prefix" type="text" label="Decoy Prefix"/>\n-    <param name="protease" type="select" label="Protease">\n-'..b'cursor_mz_tolerance" type="float" value="1.5" label="Average Percursor m/z Tolerance" />\n-        <param name="avg_precursor_mz_tolerance_units" type="select" label="Average Percursor m/z Tolerance Units">\n-          <option value="ppm">Parts per million</option>\n-          <option value="daltons" selected="true">Daltons</option>\n-        </param>\n-      </when>\n-      <when value="mono">\n-        <param name="mono_precursor_mz_tolerance" type="float" value="10" label="Monoisotopic Percursor m/z Tolerance" />\n-        <param name="mono_precursor_mz_tolerance_units" type="select" label="Monoisotopic Percursor m/z Tolerance Units">\n-          <option value="ppm">Parts per million</option>\n-          <option value="daltons">Daltons</option>\n-        </param>\n-      </when>\n-    </conditional>\n-    <param name="fragment_mz_tolerance" type="float" value="1.5" label="Fragement m/z Tolerance" />\n-    <param name="fragment_mz_tolerance_units" type="select" label="Fragment m/z Tolerance Units">\n-      <option value="ppm">Parts per million</option>\n-      <option value="daltons" selected="true">Daltons</option>\n-    </param>\n-    <conditional name="advanced">\n-      <param name="use_advanced" type="boolean" label="Set Advanced Options" />\n-      <when value="false">\n-      </when>\n-      <when value="true">\n-        <param name="max_peak_count" type="integer" value="" optional="true" label="Use Max Peaks" help="Filter out all but the specified number of peaks, keep empty to use all peaks." />\n-        <conditional name="fragmentation_rule">\n-          <param name="override" type="boolean" label="Override Fragmentation Rule (Ion Series)" />\n-          <when value="false" />\n-          <when value="true">\n-            <param type="text" name="fragmentation_rule" label="Fragmentation Rule" help="specify as a comma-separated list of a, b, c, x, y, z, or z* (z+1), e.g. b,y,z" />\n-          </when>\n-        </conditional>\n-        <param name="min_peptide_length" type="integer" value="5" label="Minimum Peptide Length" />\n-        <param name="max_peptide_length" type="integer" value="75" label="Maximum Peptide Length" />\n-        <param name="use_smart_plus_three_model" type="boolean" truevalue="true" falsevalue="false" label="Use Smart Plus 3 Model"  help="For +3 and higher precursors, the fragment ions predicted depend on the way this parameter is set. When this parameter is true, then for each peptide bond, an internal calculation is done to estimate the basicity of the b and y fragment sequence. When this parameter is false, however, ALL possible charge distributions for the fragment ions are generated for every peptide bond." checked="true" />\n-\n-      </when>\n-    </conditional>\n-    <!--\n-    <param name="max_peptide_length" value="75"\n-\n-    <param name="use_smart_plus_three_model" type="boolean" truevalue="true" falsevalue="false" label="Use Smart Plus 3 Model"  help="For +3 and higher precursors, the fragment ions predicted depend on the way this parameter is set. When this parameter is true, then for each peptide bond, an internal calculation is done to estimate the basicity of the b and y fragment sequence. When this parameter is false, however, ALL possible charge distributions for the fragment ions are generated for every peptide bond." checked="true" />\n-  -->\n-  </inputs>\n-  <outputs>\n-    <data format="raw_pepxml" name="output" from_work_dir="output">\n-      <change_format>\n-        <when input="output_type" value="mzid" format="mzid" />\n-      </change_format>\n-    </data>\n-  </outputs>\n-\n-  <help>\n-**What it does**\n-\n-Performs protein identification via database search using MyriMatch.\n-\n-------\n-\n-**Citation**\n-\n-For the underlying tool, please cite `MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. Tabb DL, Fernando CG, Chambers MC. J Proteome Res. 6(2) 654-61. 2007 Feb. PMCID PMC2525619`\n-\n-If you use this tool in Galaxy, please cite TODO\n-  </help>\n-</tool>\n'
b
diff -r 23b316fad2b0 -r 1881935e5351 tools/repository_dependencies.xml
--- a/tools/repository_dependencies.xml Fri Sep 26 15:26:14 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,4 +0,0 @@
-<?xml version="1.0"?>
-<repositories>
-  <repository changeset_revision="f66f8ca7b7b9" name="proteomics_datatypes" owner="iracooke" toolshed="https://toolshed.g2.bx.psu.edu" />
-</repositories>
b
diff -r 23b316fad2b0 -r 1881935e5351 tools/tool_dependencies.xml
--- a/tools/tool_dependencies.xml Fri Sep 26 15:26:14 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,6 +0,0 @@
-<?xml version="1.0"?>
-<tool_dependency>
-  <package name="binaries_for_package_myrimatch" version="333">
-    <repository changeset_revision="fd3893bf3008" name="package_myrimatch" owner="galaxyp" toolshed="https://toolshed.g2.bx.psu.edu" />
-  </package>
-</tool_dependency>