Galaxy |

SearchToLib (version 1.12.34+galaxy0)

Spectrum files in mzML format:

mzML conversion from RAW requires special options: msconvert --zlib --64 --mzML --simAsSpectra --filter "peakPicking true 1-" --filter "demultiplex optimization=overlap_only" *.raw

Library: Chromatagram .ELIB or Spectrum .DLIB:

Use a Prosit dlib spectral library to make a chromatogram elib using EncyclopeDIA, or else leave blank to make a Chromatogram library from just the fasta using Walnut

Background proteome protein fasta database:

provides the necessary peptide-to-protein links not specified in the spectrum library

Target fasta database:

Optional - Only analyze this subset of the background fasta proteome

Target FASTA file contains peptides:

Rather than full proteins

Parameter Settings

Parameter Settings 0

align between files:

retention-time alignment of peptides is generally not needed when when building a library from narrow-window spectrums

Select outputs:

SearchToLIB

EncyclopeDIA is library search engine comprised of several algorithms for DIA data analysis and can search for peptides using either DDA-based spectrum libraries or DIA-based chromatogram libraries. See: https://bitbucket.org/searleb/encyclopedia/wiki/Home

SearchToLIB uses the EncyclopeDIA algorithm, or the Walnut (Pecan) algorithm, to search Data-Independent Acquisition (DIA) MS/MS spectrum files and creates a DIA elib chromatogram library for EncyclopeDIA DIA quantitation search.

Inputs

Spectrum files in mzML format

A protein data base in fasta format
An optional DDA Spectral library (.dlib) that can be generated by Prosit

SearchToLIB uses Enclopedia if the Prosit dlib is provided, otherwise it uses Walnut with just a fasta.

The MSConvert command can be used to convert and deconvolute DIA raw files to mzML format. You need to use these options:
msconvert  --zlib --64 --mzML --simAsSpectra --filter "peakPicking true 1-" --filter "demultiplex optimization=overlap_only" *.raw

Outputs

A log file

A Chromatogram Library (.elib)

The identified features in tabular format Feature values of scans that are used by percolator to determine matches.

The identified Peptide Spectral Match results in tabular format Columns: PSMId, score, q-value, posterior_error_prob, peptide, proteinIds

The identified peptides in tabular format Per peptide: the normalized intensity for each scan file. Columns: Peptide, Protein, numFragments, intensity_in_file1, intensity_in_file2, ...

The identified proteins in tabular format Per protein: the normalized intensity for each scan file. Columns: Protein, NumPeptides, PeptideSequences, intensity_in_file1, intensity_in_file2, ...

Typical DIA Workflow

Two sets of Mass Spec MS/MS DIA data are collected for the experiment. In addition to collecting wide-window DIA experiments on each quantitative replicate, a pool containing peptides from every condition is measured using several staggered narrow-window DIA experiments.

SearchToLib is first run with the pooled narrow-window mzML files to create a combined DIA elib chromatogram library. If a Spectral library argument is provided, for example from Prosit, SearchToLIB uses EncyclopeDIA to search each input spectrum mzML file. Otherwise, SearchToLIB uses Walnut, a FASTA database search engine for DIA data that uses PECAN-style scoring.

Prosit generates a predicted spectrum library of fragmentation patterns and retention times for every +2H and +3H tryptic peptide in a FASTA database, with up to one missed cleavage.

EncyclopeDIA Quantify is then run on the wide-window quantitative replicate mzML files using that chromatogram library to produce quantification results.

/repository/static/images/4be4824e0d5ae3c1/SearchToLib_Workflow.png