# HG changeset patch # User pieter.lukasse@wur.nl # Date 1389177556 -3600 # Node ID d50f079096ee1e381b939d7dcb7e858811946137 Push to main toolshed diff -r 000000000000 -r d50f079096ee Csv2Apml.jar Binary file Csv2Apml.jar has changed diff -r 000000000000 -r d50f079096ee IsoFix.jar Binary file IsoFix.jar has changed diff -r 000000000000 -r d50f079096ee LICENSE --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/LICENSE Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff -r 000000000000 -r d50f079096ee MsFilt.jar Binary file MsFilt.jar has changed diff -r 000000000000 -r d50f079096ee NOTICE --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/NOTICE Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,13 @@ +PRIMS proteomics toolset & Galaxy wrappers +========================================== + +Tools and wrappers for the PRIMS proteomics toolset. +Suite of custom tools to enable data processing and +protein inference for labeled and label-free Mass Spectrometry proteomics data. +Can be used in combination with PRIMS MASSCOMB (prims_masscomb package). +Copyright 2010-2013 by Pieter Lukasse, Plant Research International (PRI), +Wageningen, The Netherlands. All rights reserved. See the license text below. + +Galaxy wrappers and installation are available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/pieterlukasse/prims_proteomics + diff -r 000000000000 -r d50f079096ee NapQ.jar Binary file NapQ.jar has changed diff -r 000000000000 -r d50f079096ee PRIMS.jar Binary file PRIMS.jar has changed diff -r 000000000000 -r d50f079096ee ProgenesisConv.jar Binary file ProgenesisConv.jar has changed diff -r 000000000000 -r d50f079096ee Quantifere.jar Binary file Quantifere.jar has changed diff -r 000000000000 -r d50f079096ee Quantiline.jar Binary file Quantiline.jar has changed diff -r 000000000000 -r d50f079096ee README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,67 @@ +PRIMS-proteomics toolset & Galaxy wrappers +========================================== + +Proteomics module of Plant Research International's Mass Spectrometry (PRIMS) toolsuite. +This toolset consists of custom tools to enable data processing and +protein inference for labeled and label-free Mass Spectrometry proteomics data. + +Can be used in combination with PRIMS-MASSCOMB (prims_masscomb package) and +with PRIMV-visualization (primv_visualization package). + +Copyright 2010-2013 by Pieter Lukasse, Plant Research International (PRI), +Wageningen, The Netherlands. All rights reserved. See the license text below. + +Galaxy wrappers and installation are available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/pieterlukasse/prims_proteomics + +History +======= + +============== ====================================================================== +Date Changes +-------------- ---------------------------------------------------------------------- +January 2014 * first release via Tool Shed +November 2013 * multiple tools used internally at PRI +end 2011 * first tool +============== ====================================================================== + +Tool Versioning +=============== + +PRIMS tools will have versions of the form X.Y.Z. Versions +differing only after the second decimal should be completely +compatible with each other. Breaking changes should result in an +increment of the number before and/or after the first decimal. All +tools of version less than 1.0.0 should be considered beta. + + +Bug Reports & other questions +============================= + +For the time being issues can be reported via the contact form at: +http://www.wageningenur.nl/en/Persons/PNJ-Pieter-Lukasse.htm + +Developers, Contributions & Collaborations +========================================== + +If you wish to join forces and collaborate on some of the +tools do not hesitate to contact Pieter Lukasse via the contact form above. + + +License (Apache, Version 2.0) +============================= + +Copyright 2013 Pieter Lukasse, Plant Research International (PRI). + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this software except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + \ No newline at end of file diff -r 000000000000 -r d50f079096ee SedMat_cli.jar Binary file SedMat_cli.jar has changed diff -r 000000000000 -r d50f079096ee csv2apml.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/csv2apml.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,127 @@ + + Converts MS/MS data in CSV format to APML format + + + Csv2Apml.jar + -peptideAndProteinMatchListCSV $peptideAndProteinMatchListCSV + -attributesMappingCSV $attributesMappingCSV + -apmlFile $apmlFile + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Generic name,name in S1 table CSV +mz,${mz} +rt,${rt} +charge,${charge} +pepSequence,${pepSequence} +ppidScore,${ppidScore} +proteinAccession,${proteinAccession} +#if $ppidTheoreticalMz != "None" +ppidTheoreticalMz,${ppidTheoreticalMz} +#end if +#if $modifications != "None" +modifications,${modifications} +#end if +#if $scoringSchemeName != "None" +scoringSchemeName,${scoringSchemeName} +#end if +#if $statisticalMeasure != "None" +statisticalMeasure,${statisticalMeasure} +#end if +#if $protSequenceLength != "None" +protSequenceLength,${protSequenceLength} +#end if +#if $pepProtStart != "None" +pepProtStart,${pepProtStart} +#end if +#if $pepProtEnd != "None" +pepProtEnd,${pepProtEnd} +#end if +#if $sourceName != "None" +sourceName,${sourceName} +#end if + + + + + + + + + + +.. class:: infomark + +This tool converts a CSV file containing MS/MS peptide identifications and their respective protein matches +to the APML xml format. +The identifications in APML format can be used for example to annotate unidentified MS features via SEDMAT(*). +This format is also compatible with what is expected by other post-processing tools like Quantifere (for +protein inference). + +(*)SEDMAT can use MS2 identification data +and couple it to this MS1 data, thereby annotating the MS1 feature list with identifications. + +----- + +**Output** + +This tools returns the input data in APML xml format. + + + diff -r 000000000000 -r d50f079096ee datatypes_conf.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/datatypes_conf.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,9 @@ + + + + + + + + + \ No newline at end of file diff -r 000000000000 -r d50f079096ee isofix.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/isofix.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,66 @@ + + Identifies in-source decay peptides and corrects protein assignments + + + IsoFix.jar + -identificationsFile $identificationsFile + -outputFile $outputFile + -format apml + -rtTol $rtTol + -logFile $logFile + #if $useOriginalProteinSequences.useOriginalProteinSequencesFile == True + -fastaFile $useOriginalProteinSequences.fastaFile + #end if + + + + + + + + + + + + + + + + + + + + + + + ( createLogFile == True ) + + + + + + +.. class:: infomark + +This tool identifies in-source decay peptides and corrects protein assignments. + +----- + +**Output example** + +This tools returns the given input file but then with corrected protein assignments and +in-source decay peptides identified (by a small modification in their sequence string). +E.g. if peptide TYNSIMK is found to be an in-source decay of HETTYNSIMK, then +its sequence is changed to HET}TYNSIMK (so the decayed part + "}" + own sequence). +E.g. decay from both sides: YNSI, HETTYNSIMK = HET}TYNSI{MK + + + + diff -r 000000000000 -r d50f079096ee msfilt.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/msfilt.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,229 @@ + + Filters annotations based MS/MS peptide identification and annotation quality measures + + + MsFilt.jar + -apmlFile $apmlFile + -datasetCode $apmlFile.metadata.base_name + -rankingMetadataFile $rankingMetadataFile + -statisticalMeasuresConfigFile $statisticalMeasuresConfigFile + -annotationSourceConfigFile $annotationSourceConfigFile + -outApml $outputApml + -outNewIdsApml $outNewIdsApml + -outFullCSV $outputCSV + -outRankingTable $outRankingTable + -outProteinCoverageCSV $outProteinCoverageCSV + -fpCriteriaExpression "$fpCriteriaExpression" + -filterOutFPAnnotations $filterOutFPAnnotations + -fpCriteriaExpressionForIds "$fpCriteriaExpressionForIds" + -filterOutFPIds $filterOutFPIds + -filterOutUnannotatedAlignments $filterOutUnannotatedAlignments + -addRawRankingInfo $addRawRankingInfo + -addScaledIntensityInfo $addScaledIntensityInfo + -addRawIntensityInfo $addRawIntensityInfo + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ${rankingWeightConfig} + ${statisticalMeasuresConfig} + ## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $annotationSourceFiles ) + ${s.identificationsFile}|${s.spectraFile} + ## also print out the datatype in the next line, based on previously configured datatype + #if isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__): + apml + #else: + mzid + #end if + #end for + ## end comment + + + + + ( apmlFile != None ) + + + ( filterOutFPIds == True ) + + + ( apmlFile != None ) + + + ( apmlFile != None ) + + + + ( len(list(enumerate(annotationSourceFiles))) > 0 ) + + + + + + + +.. class:: infomark + +This tool takes in peptide quantification results (e.g. either by SEDMAT for label-free data or by Quantiline for labeled data) +and calculates a number of quality measures that can help in assessing the correctness of the quantification assignment and of the MS/MS peptide +identification itself. The user can use any combination of quality measures (qm's) and statistical measures (sm's) to filter out +low scoring entries. + +.. class:: infomark + +In the label-free data processed by SEDMAT it is possible that a feature quantification gets assigned to different peptides. This means +we have an ambiguous assignment. In such a case +this tool also does a ranking of the different assignments according to their quality measures so that the best scoring assignment +gets ranked as first. + +----- + +**List of abbreviations** + +QM: Quality Measure + +SM: Statistical Measure (e.g. p-value, e-value from MS/MS identification) + +PSM: "Peptide to Spectrum Match" (aka peptide identification) + +FP: False Positive + +----- + +**Filtering options details** + +The FP criteria will be applied to an annotation even if the corresponding quality measures involved +in the expression can NOT ALL be determined. QMs that cannot be determined, get the value 0 (zero) which is +equal to giving it the average value. + +The output report shows some plots that visualize the filtering done. This can help in fine-tuning the right filtering +criteria. + +----- + +**Output details** + +*APML output* + +This tools returns the given APML alignment file further annotated at the alignment level with the best ranking +peptides of each respective alignment. This APML can be used in subsequent Galaxy tools like the proteomics tools +from NBIC. + +The APML output can also be used for the Protein Inference step (see Quantifere tool). + +*CSV output* + +It also returns a CSV format output with the full quality measures and scoring and ranking details. The user could use +this to manually determine new weights for some of the quality measures by techniques such as +linear regression. In other words, this CSV can then be used to fine-tune the weights in a next run. + +Many of the quality measures (QMs) are normalized to their Standard Score (aka z-score). +`See Standard Score for more details...`__ + +Next to giving insight into how the ranking was established, a more complete version of this CSV file is also +generated for tools that cannot or won't process the APML output format. + +Below an brief overview of the CSV and an illustration of the ranking done in case of ambiguous peptides to feature assignments +(explained above, can happen in case of label-free data processing by SEDMAT). + + +.. image:: $PATH_TO_IMAGES/msfilt_csv_out.png + + + +.. __: javascript:window.open('http://en.wikipedia.org/wiki/Standard_score','popUpWindow','height=700,width=800,left=10,top=10,resizable=yes,scrollbars=yes,toolbar=yes,menubar=no,location=no,directories=no,status=yes') + + + + + + diff -r 000000000000 -r d50f079096ee napq.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/napq.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,93 @@ + + 'no alignment'(alignment-free) peptide quantification + + + NapQ.jar + -identificationsConfigFile $identificationsConfigFile + -namingConventionCodesForSamples $namingConventionCodesForSamples + #if $is2D_LC_MS.fractions == True + -namingConventionCodesForFractions $is2D_LC_MS.namingConventionCodesForFractions + #end if + -outputApml $outputApml + -outputTsv $outputTsv + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + + + + + + + + + + + + + + + + + + + + + + ## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $identificationFileList ) + ${s.identificationsFile}|${s.spectraFile} + ## also print out the datatype in the next line, based on previously configured datatype + #if isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__): + apml + #else: + mzid + #end if + #end for + ## end comment + + + + + + + + + + + + +.. class:: infomark + +This tool takes in multiple peptide identification result files that have peptide identifications +coupled to some quantification (e.g. precursor intensity information or for example data coming +from MS^E acquisition where peptide identification and quantification are done in the same run and reported together). +Then, based on the given experiment design parameters (i.e. how the result files related back to +replicate runs and samples), it produces a new file in which the peptides are reported with +their calculated quantifications at the sample level. + +The figure below explains this: + +.. image:: $PATH_TO_IMAGES/napq_overview.png + + + + + + + + diff -r 000000000000 -r d50f079096ee prims_proteomics_datatypes.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/prims_proteomics_datatypes.py Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,42 @@ +""" +PRIMS proteomics classes for types defined in datatypes_conf.xml +""" +import logging +import re +from galaxy.datatypes.data import * +from galaxy.datatypes.xml import * +from galaxy.datatypes.sniff import * +from galaxy.datatypes.binary import * +from galaxy.datatypes.interval import * + +log = logging.getLogger(__name__) + + +class ProteomicsXml(GenericXml): + """ An enhanced XML datatype used to reuse code across several + proteomic/mass-spec datatypes. (this part of the code is taken from protk proteomics datatypes package) """ + + def sniff(self, filename): + """ Determines whether the file is the correct XML type. """ + with open(filename, 'r') as contents: + while True: + line = contents.readline() + if line == None or not line.startswith(' + Converts Progenesis aligned feature lists in CSV format to APML + + + ProgenesisConv.jar + -progenesisFile $progenesisFile + -apmlFile $apmlFile + #if $multipleScoringSchemes.containsMultipleScoringSchemes == True + -scoringSchemeNameColumn $multipleScoringSchemes.scoringSchemeNameColumn + #end if + #if $statisticalMeasure.containsStatisticalMeasure == True + -statisticalMeasureColumn $statisticalMeasure.statisticalMeasureColumn + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +.. class:: infomark + +This tool converts a Progenesis CSV file to the APML xml format. +This format can be used to submit the data for annotation by SEDMAT. SEDMAT can use MS2 identification data +and couple it to this MS1 data, thereby annotating the MS1 feature list with identifications. + +----- + +**Output example** + +This tools returns APML output that can be used as input for the SEDMAT tool. + + + diff -r 000000000000 -r d50f079096ee quantifere.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/quantifere.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,206 @@ + + Protein Inference by Peptide Quantification patterns + + + Quantifere.jar + -annotatedQuantificationFilesList $annotatedQuantificationFilesList + -identificationFilesList $identificationFilesList + -statisticalMeasuresConfigFile $statisticalMeasuresConfigFile + -quantificationDataToUse $quantificationDataToUse + -minCorrel $minCorrel + -minProtCoverage $minProtCoverage + -minAboveAverageHits $minAboveAverageHits + -minNrIdsForInferencePeptide $minNrIdsForInferencePeptide + -refineModel $refineModel + -functionalAnnotationCSV $functionalAnnotationCSV + -outputCSV $outputCSV + -outputInferenceLogCSV $outputInferenceLogCSV + -outputSummaryAnnotationCSV $outputSummaryAnnotationCSV + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + #if $is2D_LC_MS.fractions == True + -namingConventionCodesForFractions $is2D_LC_MS.namingConventionCodesForFractions + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $annotatedQuantificationFiles ) + ${s.annotatedQuantificationFile} + #end for + ## end comment + + ## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $identificationFiles ) + ${s.identificationFile} + ## also print out the datatype in the next line, based on previously configured datatype + #if isinstance( $s.identificationFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__): + apml + #else: + mzid + #end if + #end for + ## end comment + ## start comment + ${statisticalMeasuresConfig} + + + + + + + + ( summaryReport == True ) + + + + ( functionalAnnotationCSV != None ) + + + + + + +.. class:: infomark + +This tool takes Peptide Quantification patterns and uses this to do Protein Inference of both Primary Protein +identifications as well as Secondary Protein identifications. This last class of protein identifications +can not be done by traditional protein inference methods that look only at peptide identifications and +their quality parameters. + + +----- + +**List of definitions** + +Primary Protein identification: protein identification belonging to the minimum set of proteins needed +to account for the observed peptides. + +Secondary Protein identification: extra protein identifications that do not below to the minimum set +of proteins mentioned above. + +raw intensities : is the intensity value resulting from the integration of the feature peak area + +apex intensities: is the intensity value as on the highest point of the feature peak + +normalized intensities : is the intensity normalized by some means + +----- + +**Minimum correlation in a cluster** + +TODO - add doc. + +----- + +**Output details** + +*Proteins list (CSV)* + +This is the list of primary and secondary proteins and their calculated inference score. Proteins +with exactly the same peptide hits are also grouped together and labeled as primary_group and secondary_group +instead of simply primary and secondary. + + +*Inference log (CSV)* + +This CSV table shows all data, both inferred and ruled out proteins. This can be used by the user to +troubleshoot the inference process and understand why certain proteins might have been ruled out. +The CSV is provided in such a format that the data can easily be explored in a Cytoscape network. + +The figure below shows an example of the data being explored in Cytoscape using also the +`Cytoscape chartplugin`_ to visualize the quantification data when selecting the peptide nodes. + +.. image:: $PATH_TO_IMAGES/quantifere_cyto_out.png + + +.. _Cytoscape chartplugin: http://apps.cytoscape.org/apps/chartplugin + + + + + diff -r 000000000000 -r d50f079096ee quantiline.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/quantiline.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,62 @@ + + Labeled ms/ms data pre-processing for Protein Quantification (and Inference) pipelines + + + Quantiline.jar + -ppidsFileName $ppidsFileName + -spectraDataFile $spectraDataFile + -ppidsInputFormat MZID + -labelMzValues "$labelMzValues" + -labelmTol $labelmTol + -outputFile $outputFile + -outReport $outReport + + + + + + + + + + + + + + + + + + + +.. class:: infomark + +This tool can read spectra files (mzML) and their respective identification files (mzIdentML) and based +on the configured label masses produce a file that contains the merged information: +peptides and their quantification based on label fragment intensity values read from the spectrum in which they +were identified. + +In other words, it produces the peptide (relative) quantification file. This file can subsequently be used +by other tools for protein inference and protein quantification (e.g. Quantifere). + + +----- + +**Output details** + +*Peptide quantification file (APML)* + +This is the list of peptides with their (relative) quantification based on the labels and their +intensities found in the label peaks of the corresponding spectrum. + + + + + + diff -r 000000000000 -r d50f079096ee repository_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/repository_dependencies.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,5 @@ + + + + + diff -r 000000000000 -r d50f079096ee sedmat.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/sedmat.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,144 @@ + + Matches MS and MS/MS results + + + SedMat_cli.jar + -pl $inputMS + -plInputFormat apml + -ppids $fileType.inputFormatType.ppidsFile + -ppidsFileGrouping $fileType.type + -ppidsInputFormat $fileType.inputFormatType.ppidsInputFormat + -ppidsFileDescription $fileType.inputFormatType.ppidsFile.name + #if $fileType.inputFormatType.ppidsInputFormat == "mzid" + -spectraDataFile $fileType.inputFormatType.spectraDataFile + #end if + -out $outputData + -outUnmatchedMS2 $outUnmatchedMS2 + -mtol $mtol + -rttol $rttol + -rtShiftDetectionWindow $rtShiftDetectionWindow + -matchOnSameSourceOnly $matchOnSameSourceOnly + -chargeStatesToGenerate $chargeStatesToGenerate + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + #if $troubleshoot1.troubleshootPeakLocations == True + -troubleshootPeakLocations YES + -mStart $troubleshoot1.mStart + -mEnd $troubleshoot1.mEnd + -rtStart $troubleshoot1.rtStart + -rtEnd $troubleshoot1.rtEnd + -filterSourceName $troubleshoot1.filterSourceName + #end if + #if $matchOnNamingConvention.match == True + -matchOnNamingConvention YES + -namingConventionCodesForMatching $matchOnNamingConvention.namingConventionCodesForMatching + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ( summaryReport == True ) + + + + + + + + + +.. class:: infomark + +This tool matches MS and MS/MS results. SEDMAT stands for "Single Experiment Data Matching Tool". +It can match peaks found in the MS spectra with the peptides found using the MS/MS spectra. +The result is the list of MS peaks annotated with peptides and proteins. + +----- + +**Output example** + +This tools returns APML output, a Cytoscape network (.xgmml) of the matches and Retention Time plots (.pdf). + + + diff -r 000000000000 -r d50f079096ee static/images/msfilt_csv_out.png Binary file static/images/msfilt_csv_out.png has changed diff -r 000000000000 -r d50f079096ee static/images/napq_overview.png Binary file static/images/napq_overview.png has changed diff -r 000000000000 -r d50f079096ee static/images/quantifere_cyto_out.png Binary file static/images/quantifere_cyto_out.png has changed