Prior to statistical analysis, it is necessary to perform three steps to transform the MaxQuant output for phosphoproteome-enriched samples.
The input dataset for this tool is the Phospho (STY)Sites.txt file that is produced:
- by the Galaxy "MaxQuant" (maxquant) tool
- or by the Galaxy "Maxquant (using mqpar.xml)" (maxquant_mqpar) tool
- or by the desktop version of MaxQuant.
The "MaxQuant Phosphopeptide ANOVA" tool (mqppep_anova) consumes the "preprocessed" output file preproc_tab that this tool produces.
This step applies a "localization-probability cut-off" for phosphopeptides for each phosphopeptide. Higher values may reduce the number of peptides in the output. The default value of 0.75 reflects the text of [Cheng 2018]:
"For phosphopeptide identification, a localization probability cutoff is applied. This filter is performed to select for phosphopeptides with a high confidence (i.e., greater than 0.75) in phosphoresidue identification [Hogrebe 2018; Olsen 2006]. In other words, the summed probability of all other residues that could potentially contain the phospho-group is less than 0.25. This cutoff could be raised to increase the stringency of the phosphopeptide selection. In regard to the number of identifications, the expected number of pY peptides is in the hundreds, while the expected number of pST peptides is in the high thousands. These values reflect previously observed phosphoproteome distribution where about 2%, 12%, and 86% of the phosphosites are pY, pT, and pS, respectively [Olsen 2006]."
This tool wraps an R script. written by Larry Cheng, that performs the following (in order):
Note that the "ProTeomiX Quality Control Report" [Bielow 2016] (available at https://github.com/cbielow/PTXQC/) is run by the Galaxy wrappers for MaxQuant, so it is omitted here even though it was included in Larry Cheng's original script.
This step searches phosphopeptides against several databases for known or predicted sites.
To generate this file:
- Download the "precomputed data for all available kinase predictors against ENSEMBL" (available at the NetworkKIN predictions link on the downloads page at https://web.archive.org/web/20200208000403/http://networkin.info/download/networkin_human_predictions_3.1.tsv.xz; N.B.: "Commercial users are requested to contact the authors before using the data on the networkin.info website");
- Decompress the .tsv.xz with file with "unxz" (from XZ Utils https://tukaani.org/xz/);
- Filter out the rows having "network_kin" less than 2.0.
The result should be a tab-separated file with the following columns:
- #substrate
- position
- id
- networkin_score
- tree
- netphorest_group
- netphorest_score
- string_identifier
- string_score
- substrate_name
- sequence
- string_path
This database merges motif patterns from [Amanchy 2007] and Phosida [Gnad 2011].
The Amanchy data are adapted from https://web.archive.org/web/*/http://hprd.org/serine_motifs and https://web.archive.org/web/*/http://hprd.org/tyrosine_motifs (both links cite the reference where each motif was published), and the patterns are translated into Perl regular expression format (https://perldoc.perl.org/perlre).
The Phosida data are adapted (translated to Perl-formatted regular expressions) from http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx (this link cites the reference where each motif was published).
This file has three tab-separated columns (and no header):
- column 1 is an (ignored) identifier
- column 2 is a Perl regular expression
- column 3 is a descriptor.
For two examples:
2<TAB>R.R..(pS|pT)<TAB>Akt kinase substrate motif (HPRD)
10<TAB>R..(pS|pT)V<TAB>CAMK2_Phosida
'Kinase-substrate dataset: experimentally determined substrates, sequences, cognate kinases, and metadata curated from the literature' [Hornbeck 2011]. This tabular-formatted file may be downloaded for non-commercial purposes as 'Kinase_Substrate_Dataset.gz' from https://www.phosphosite.org/staticDownloads.action.
Data extracted from PhosphoSitePlus(R), created by Cell Signaling Technology Inc. PhosphoSitePlus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (https://creativecommons.org/licenses/by-nc-sa/3.0/). Attribution must be given in written, oral and digital presentations to PhosphoSitePlus, www.phosphosite.org. Written documents should additionally cite:
Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261-D270.; www.phosphosite.org.
'Regulatory sites: information curated from the literature about modification sites shown to regulate molecular functions, biological processes, and molecular interactions including protein-protein interactions' [Hornbeck 2011]. This tabular-formatted file may be downloaded for non-commercial purposes as 'Regulatory_sites.gz' from https://www.phosphosite.org/staticDownloads.action.
Terms of use and citatation are as for the psp_kinase_substrate file.
Data table (in tabular format, consumed by the merge/filter step) presenting, for each phosphopeptide, the kinase mappings, the mass-spectral intensities for each sample, and the metadata from UniProtKB/SwissProt, phospho-sites, phospho-motifs, and regulatory sites. Data in the columns marked "Domain", "ON_...", or "..._PhosphoSite" are available subject to the following terms:
"PhosphoSitePlus® (PSP) was created by Cell Signaling Technology Inc. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License(https://creativecommons.org/licenses/by-nc-sa/3.0/). When using PSP data or analyses in printed publications or in online resources, the following acknowledgements must be included: (a) the words 'PhosphoSitePlus(R), www.phosphosite.org' must be included at appropriate places in the text or webpage, and (b) citation of [Hornbeck 2011 (PMID: 25514926)] must be included in the bibliography."
This step merges mapped metadata into metadata for phosphopeptides, filtering by species.
Phosphopeptides annotated with SwissProt and phosphosite metadata, in tabular format. This file is designed to be consumed by the downstream ANOVA tool. Some data in the columns marked "PSP" are available subject to the following terms:
"PhosphoSitePlus® (PSP) was created by Cell Signaling Technology Inc. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License(https://creativecommons.org/licenses/by-nc-sa/3.0/). When using PSP data or analyses in printed publications or in online resources, the following acknowledgements must be included: (a) the words 'PhosphoSitePlus(R), www.phosphosite.org' must be included at appropriate places in the text or webpage, and (b) citation of [Hornbeck 2011 (PMID: 25514926)] must be included in the bibliography."