view text_exporter.xml @ 6:42b843627623 draft default tip

Uploaded
author galaxyp
date Fri, 21 Jun 2013 17:01:53 -0400
parents cf0d72c7b482
children
line wrap: on
line source

<tool id="openms_text_exporter" version="0.1.0" name="Text Exporter">
  <description>    
  </description>
  <macros>
    <import>macros.xml</import>
  </macros>
  <expand macro="stdio" />
  <expand macro="requires" />
  <command interpreter="python">
    openms_wrapper.py \
      --executable '__SHELL__' --config $link \
      --executable 'TextExporter' --config $config
  </command>
  <configfiles>
    <configfile name="link">ln -s '${type.input}' 'input.${type.input.ext}'</configfile>
    <configfile name="config">[simple_options]
in=input.${type.input.ext}
out=${out}
#set $input_type = str($type.input_type)
#if $input_type == "featurexml"
feature!minimal=${type.minimal}
#end if
no_ids=${no_ids}
</configfile>
  </configfiles>
  <inputs>
    <conditional name="type">
      <param name="input_type" type="select" label="Input Type">
        <option value="featurexml">Features (FeatureXML)</option>
        <option value="consensusxml">Consensus (ConsensusXML)</option>
        <option value="idxml">Identifications (IdXML)</option>
        <option value="mzml">Peak List (mzML)</option>
      </param>
      <when value="mzml">
        <param format="mzml" name="input" type="data" label="Input Peak List"/>
      </when>
      <when value="featurexml">
        <param format="featurexml" name="input" type="data" label="Input Features"/>
        <param name="minimal" type="boolean" label="Minimal Output" help="Set this flag to write only three attributes: RT, m/z, and intensity." truevalue="true" falsevalue="false" />
      </when>
      <when value="consensusxml">
        <param format="consensusxml" name="input" type="data" label="Input Consensus"/>
      </when>
      <when value="idxml">
        <param format="idxml" name="input" type="data" label="Input Identifications"/>
      </when>
    </conditional>
    <param name="no_ids" type="boolean" label="Suppress IDs" help="Supresses output of identification data." truevalue="true" falsevalue="false" />
  </inputs>
  <outputs>
    <data format="txt" name="out" />
  </outputs>
  <help>
**What it does**

The goal of this tool is to create output in a table format that is easily readable in Excel or OpenOffice. Lines in the output correspond to rows in the table.

utput files begin with comment lines, starting with the special character "#". The last such line(s) will be a header with column names, but this may be preceded by more general comments.

Because the OpenMS XML formats contain different kinds of data in a hierarchical structure, TextExporter produces somewhat unusual TSV/CSV files for many inputs: Different lines in the output may belong to different types of data, and the number of columns and the meanings of the individual fields depend on the type. In such cases, the first column always contains an indicator (in capital letters) for the data type of the current line. In addition, some lines have to be understood relative to a previous line, if there is a hierarchical relationship in the data. (See below for details and examples.)

Missing values are represented by "-1" or "nan" in numeric fields and by blanks in character/text fields.

Depending on the input and the parameters, the output contains the following columns:

featureXML input:

first column: RUN / PROTEIN / UNASSIGNEDPEPTIDE / FEATURE / PEPTIDE (indicator for the type of data in the current row)
a RUN line contains information about a protein identification run; further columns: run_id, score_type, score_direction, data_time, search_engine_version, parameters
a PROTEIN line contains data of a protein identified in the previously listed run; further columns: score, rank, accession, coverage, sequence
an UNASSIGNEDPEPTIDE line contains data of peptide hit that was not assigned to any feature; further columns: rt, mz, score, rank, sequence, charge, aa_before, aa_after, score_type, search_identifier, accessions
a FEATURE line contains data of a single feature; further columns: rt, mz, intensity, charge, width, quality, rt_quality, mz_quality, rt_start, rt_end
a PEPTIDE line contains data of a peptide hit annotated to the previous feature; further columns: same as for UNASSIGNEDPEPTIDE
With the no_ids flag, only FEATURE lines (without the FEATURE indicator) are written.

With the feature:minimal flag, only the rt, mz, and intensity columns of FEATURE lines are written.

consensusXML input:

Output format produced for the out parameter:

first column: MAP / RUN / PROTEIN / UNASSIGNEDPEPTIDE / CONSENSUS / PEPTIDE (indicator for the type of data in the current row)
a MAP line contains information about a sub-map; further columns: id, filename, label, size (potentially followed by further columns containing meta data, depending on the input)
a CONSENSUS line contains data of a single consensus feature; further columns: rt_cf, mz_cf, intensity_cf, charge_cf, width_cf, quality_cf, rt_X0, mz_X0, ..., rt_X1, mz_X1, ...
"..._cf" columns refer to the consensus feature itself, "..._Xi" columns refer to a sub-feature from the map with ID "Xi" (no quality column in this case); missing sub-features are indicated by "nan" values
see above for the formats of RUN, PROTEIN, UNASSIGNEDPEPTIDE, PEPTIDE lines
With the no_ids flag, only MAP and CONSENSUS lines are written.

Output format produced for the consensus_centroids parameter:

one line per consensus centroid
columns: rt, mz, intensity, charge, width, quality
Output format produced for the consensus_elements parameter:

one line per sub-feature (element) of a consensus feature
first column: H / L (indicator for new/repeated element)
H indicates a new element, L indicates the replication of the first element of the current consensus feature (for plotting)
further columns: rt, mz, intensity, charge, width, rt_cf, mz_cf, intensity_cf, charge_cf, width_cf, quality_cf
"..._cf" columns refer to the consensus feature, the other columns refer to the sub-feature
Output format produced for the consensus_features parameter:

one line per consensus feature (suitable for processing with e.g. R)
columns: same as for a CONSENSUS line above, followed by additional columns for identification data
additional columns: peptide_N0, n_diff_peptides_N0, protein_N0, n_diff_proteins_N0, peptide_N1, ...
"..._Ni" columns refer to the identification run with index "Ni", n_diff_... stands for "number of different ..."; different peptides/proteins in one column are separated by "/"
With the no_ids flag, the additional columns are not included.

idXML input:

first column: RUN / PROTEIN / PEPTIDE (indicator for the type of data in the current row)
see above for the formats of RUN, PROTEIN, PEPTIDE lines
additional column for PEPTIDE lines: predicted_rt
With the id:proteins_only flag, only RUN and PROTEIN lines are written.

With the id:peptides_only flag, only PEPTIDE lines (without the PEPTIDE indicator) are written.

With the id:first_dim_rt flag, the additional columns rt_first_dim and predicted_rt_first_dim are included for PEPTIDE lines.

**Citation**

For the underlying tool, please cite ``Marc Sturm, Andreas Bertsch, Clemens Gröpl, Andreas Hildebrandt, Rene Hussong, Eva Lange, Nico Pfeifer, Ole Schulz-Trieglaff, Alexandra Zerck, Knut Reinert, and Oliver Kohlbacher, 2008. OpenMS – an Open-Source Software Framework for Mass Spectrometry. BMC Bioinformatics 9: 163. doi:10.1186/1471-2105-9-163.``

If you use this tool in Galaxy, please cite Chilton J, et al. https://bitbucket.org/galaxyp/galaxyp-toolshed-openms
  </help>
</tool>