Mercurial > repos > galaxyp > pmd_fdr

<tool id="pmd_fdr" name="PMD FDR" version="1.4.0">
    <description>recalculate FDR fom precursor mass discrepancy</description>
    <requirements>
        <requirement type="package" version="3.5.1">r-base</requirement>
        <requirement type="package" version="1.4.0">r-stringr</requirement>
        <requirement type="package" version="0.4">r-argparser</requirement>
        <requirement type="package" version="0.2-16">r-codetools</requirement>
        <requirement type="package" version="0.4.32">r-runit</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        Rscript '${__tool_directory__}/PMD_FDR_package_for_Galaxy.R'
        --psm_report '$psm_report'
        #if $psm_report_1_percent:
          --psm_report_1_percent '$psm_report_1_percent'
        #end if
        --input_file_type $input_file_type
        --output_g_fdr '$output_g_fdr'
        --output_i_fdr '$output_i_fdr'
        --output_densities '$output_densities'
    ]]></command>
    <inputs>
        <param argument="--psm_report" type="data" format="tabular" label="PSM report (Peptide Spectrum Match)"/>
        <param argument="--psm_report_1_percent" type="data" format="tabular" label="PSM report at 1% FDR (Optional)" optional="true"/>
        <param argument="--input_file_type" type="select" label="Input file type">
            <option value="PSM_Report" selected="true">PeptideShaker PSM_Report</option>
            <option value="PMD_FDR_input_file">PMD_FDR_input_file</option>
            <option value="MaxQuant_Evidence">MaxQuant_Evidence</option>
        </param>
    </inputs>
    <outputs>
        <data name="output_g_fdr" format="tabular" label="${tool.name} on ${on_string} output_g_fdr"/>
        <data name="output_i_fdr" format="tabular" label="${tool.name} on ${on_string} output_i_fdr"/>
        <data name="output_densities" format="tabular" label="${tool.name} on ${on_string} output_densities"/>
    </outputs>
    <tests>
        <test>
            <param name="psm_report" ftype="tabular" value="test_PSM_Report.tabular"/>
            <output name="output_g_fdr" file="output_g_fdr.tabular" />
            <output name="output_i_fdr" file="output_i_fdr.tabular" />
            <output name="output_densities" file="output_densities.tabular" />
        </test>
    </tests>
    <help><![CDATA[
=======
PMD FDR
=======
PMD FDR calculates individual and global False Discovery Rate (FDR) using Precursor Mass Discrepancy (PMD) from a Peptide Spectrum Match (PSM) report.  ee the `PMD-FDR-for-Galaxy-P home page <https://github.com/slhubler/PMD-FDR-for-Galaxy-P/blob/master/README.md#pmd-fdr-for-galaxy-p>`_ for details.

----------
**Inputs**
----------

  **Primary Input file** is a PSM report, 3 formats are currently accepted:

    * PeptideShaker PSM report (--input_file_type PSM_Report)
      This format is the output of PeptideShaker. We expect it to be a tab-delimited file with the first row being the column labels.
      The following fields are required for a file to be correctly processed:

        - Confidence [%]
        - Precursor m/z Error [ppm]
        - Spectrum File
        - Protein(s)
        - Spectrum Title
        - Sequence
        - Decoy

    * Maxquant evidence.txt (--input_file_type MaxQuant_Evidence)
      The following fields are required for correct processing (others can exist):

        - PEP
        - Mass error [ppm]
        - Proteins
        - Retention time
        - Sequence
        - Reverse

    * Custom generated (--input_file_type PMD_FDR_input_file)
      having columns with header names:

        - PMD_FDR_input_score
        - PMD_FDR_pmd
        - PMD_FDR_spectrum_file
        - PMD_FDR_proteins
        - PMD_FDR_spectrum_title
        - PMD_FDR_sequence
        - PMD_FDR_decoy

  **Optional PSM 1% file**

    At present, this only supports PSM_Report format. This is simply the PSM_Report from PeptideShaker using a 1% FDR. This file is matched against the Primary Input file, using the spectrum file and title to match records. If they agree, the record (in the original input) is marked by setting is_one_percent_FDR to TRUE


-----------
**Outputs**
-----------

  Score ranges are used in all three files. An example score range: 060_099_ge_lt

    The structure of score range field is *aaa_bbb_cc_dd* where:

      * *aaa* is the lower bound of the score range
      * *bbb* is the upper bound of the score range
      * *cc* and *dd* describe the lower bound and upper bound comparison operator:

        - eq - "equal to"
        - ge - "greater than or equal to"
        - gt - "greater than"
        - le - "less than or equal to"
        - lt - "less than"


  **output_densities**
    File contains a normalized version of the density function applied to up to 13 subsets of the data. All but "x" refers to the subsetting variable. As such, each column, except x, should sum to 1:

     - x *center of a range of normalized PMD interval*
     - t *(estimated) relative abundance of True Hits*
     - f *(estimated) relative abundance of False Hits*
     - aaa_bbb_cc_dd *relative abundance of score range; see above for definition of score*
     - decoy *relative abundance of decoys (superset of f)*

    Example:

    ::

       x       t       f       000_060_ge_lt   060_099_ge_lt   099_100_ge_lt   100_100_eq_eq   decoy
       -16.9133785470534       4.16622917525113e-18    8.93652665157669e-05    0.000309818912433651    0       3.74804907374386e-18    4.16622917525113e-18    8.93652665157669e-05
       -16.8519045481908       4.23773371666637e-18    9.74050812738847e-05    0.000317480777369488    0       5.16725222326666e-18    4.23773371666637e-18    9.74050812738847e-05
       ...

  **output_g_fdr**

    File contains the groupwise FDR (gFDR) for each score range:

      - *group_of_interest* name of score group
      - *alpha* gFDR

      *Alpha hould be a number between 0 and 1 However, it is generated after excluding the training data and based on the resulting (random) peak height of each distribution. This means that the gFDR can be greater than 1 in practice.*

    Example::

        group_of_interest       alpha
        000_060_ge_lt   0.953202547009365
        060_099_ge_lt   0.477777669534645
        099_100_ge_lt   0.381269718674806
        100_100_eq_eq   0
        decoy   1


  **output_i_fdr**

    This file contains the individual FDR (iFDR) for each PSM, with supporting evidence:

      - PMD_FDR_spectrum_title *Unique identifier concatenating PMD_FDR_spectrum_file and PMD_FDR_spectrum_index*
      - value *Identical to PMD_FDR_pmd. Implemented this way to allow future alterations that would use input variables other than PMD*
      - PMD_FDR_decoy *Input variable - 1 for decoy, 0 for other*
      - median_of_group_index *median PMD for good-training records with the same group_index as the current record*
      - value_norm *normalized value (value minus median_of_group_index)*
      - used_to_find_middle *logical variable reflecting the following statement: was this record used to identify the median_of_group_index? (These records MUST be excluded in any summary statistics.)*
      - PMD_FDR_input_score *The score used to separate data*
      - PMD_FDR_pmd *Precursor Mass Discrepancy*
      - PMD_FDR_peptide_length *Peptide length of identified peptide*
      - PMD_FDR_spectrum_file *Name of file containing spectrum*
      - PMD_FDR_spectrum_index *Spectrum number within file*
      - PMD_FDR_proteins *Protein name*
      - group_input_score *Grouping by score*
      - group_pmd *Grouping by PMD (approx 20 groups)*
      - group_peptide_length *Grouping by peptide length*
      - group_training_class *Grouping by Training class, (see notes)*
      - group_proteins *Grouping by Protein groups (see notes)*
      - group_spectrum_file *Same as PMD_FDR_spectrum_file*
      - group_spectrum_index *Contiguous groups of spectra (see notes)*
      - group_proteins *Grouping by species (however, see notes)*
      - group_decoy_input_score *decoy version of group_input_score*
      - group_decoy_pmd *decoy version of group_pmd*
      - group_decoy_peptide_length *decoy version of group_peptide_length*
      - group_decoy_spectrum_file *decoy version of group_spectrum_file*
      - group_decoy_spectrum_index *decoy version of group_spectrum_index*
      - group_decoy_proteins *decoy version of group_proteins*
      - is_in_1percent *PSM in 1% FDR file (if it exists)*
      - value_of_interest *Defunct column (used during processing)*
      - group_of_interest *Defunct column (used during processing)*
      - interpolated_groupwise_FDR *estimated gFDR, interpolated from gFDR derived from group_decoy_input_score*
      - t *density of t at PMD of record*
      - f *density of f at PMD of record*
      - alpha *same as interpolated_groupwise_FDR*
      - i_fdr *iFDR (alphaf / (alphaf + (1-alpha)*t))*


    ]]></help>
</tool>
author	galaxyp
date	Thu, 10 Oct 2019 17:29:37 -0400
parents	5cc0c32d05a2
children