Mercurial > repos > eschen42 > w4mjoinpn
changeset 0:948bac693947 draft
planemo upload for repository https://github.com/HegemanLab/w4mjoinpn_galaxy_wrapper/tree/master commit cedf2e01903099ef5f1bbe624afe4c2845d6bf23
author | eschen42 |
---|---|
date | Sun, 29 Oct 2017 10:05:05 -0400 |
parents | |
children | dcfaffec48c8 |
files | LICENSE README README.md w4mjoinpn.sh w4mjoinpn.xml |
diffstat | 5 files changed, 342 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/LICENSE Sun Oct 29 10:05:05 2017 -0400 @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2017 Hegeman Lab + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README Sun Oct 29 10:05:05 2017 -0400 @@ -0,0 +1,5 @@ +This tool joins two sets of post-XCMS post-CAMERA Workflow4Metabolomics datasets +(i.e., sampleMetadata, variableMetadata, dataMatrix), +one gathered in negative ionization-mode and the other in positive ionization-mode. + +Please see https://github.com/HegemanLab/w4mjoinpn_galaxy_wrapper for details.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.md Sun Oct 29 10:05:05 2017 -0400 @@ -0,0 +1,33 @@ +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1038289.svg)](https://doi.org/10.5281/zenodo.1038289) + +# w4mjoinpn_galaxy_wrapper + +This tool joins two sets of MS1 datasets for **exactly** the same set of samples, where one was gathered in positive +ionization-mode and the other in negative ionization-mode, for reasons set forth below. + +Workflow4Metabolomics (W4M, Giacomoni *et al.*, 2014, http://dx.doi.org/10.1093/bioinformatics/btu813; http://workflow4metabolomics.org; +https://github.com/workflow4metabolomics) provides a suite of Galaxy tools for processing and analyzing metabolomics data. + +W4M uses the XCMS package (Smith *et al.*, 2006 http://dx.doi.org/10.1021/ac051437y) to extract features and align +their retention times among multiple samples. + +After peak extraction and alignment, W4M uses the CAMERA package (Kuhl *et al.*, 2012, http://dx.doi.org/10.1021/ac202450g) +"to postprocess XCMS feature lists and to collect all features related to a compound into a compound spectrum." + +Both of these steps are done using data collected in a single ionization mode (i.e., only negative or only positive) +because it would not make sense to attempt to use CAMERA otherwise. + +However, multivariate analysis in general, and particularly the "False Discovery Rate" adjustment in hypothesis testing, +would both benefit from having all variables (features), negative and positive, combined for one analysis. It is also +cumbersome to be forced to do an analysis twice, once for each ionization mode. + +This tool will fail: + * when the samples are not listed in exactly the same order in the negative-mode dataMatrix and the positive-mode dataMatrix + * when the samples are not listed in exactly the same order in the negative-mode sampleMetadata and the positive-mode sampleMetadata + +Otherwise + * the two dataMatrix files are concatenated, and the names of features identified from positive ionization-mode data +are prefixed with "P"; negative, with "N". + * the two variableMetadata files are concatenated, and the names of features are prefixed in the same way. + * if sampleMetadata has a polarity column, its value is set to "posneg" in the output. + * Technically, the sampleMetadata file in the output is derived from the negative ionization-mode sampleMetadata.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/w4mjoinpn.sh Sun Oct 29 10:05:05 2017 -0400 @@ -0,0 +1,140 @@ +#!/bin/bash +# join positive and negative ionization-mode XCMS datasets for a common set of samples +# summary: +# - parse and validate arguments (or abort) +# - check that the same samples are present in the same order in both the positive and negative mode data matrices (or abort) + +# Parse arguments +# ref: https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash/14203146#14203146 +POSITIONAL=() +while [[ $# -gt 0 ]]; do + key="$1" + case $key in + dmpos) + DMPOS="$2" + shift # past argument + shift # past value + ;; + dmneg) + DMNEG="$2" + shift # past argument + shift # past value + ;; + dmout) + DMOUT="$2" + shift # past argument + shift # past value + ;; + smpos) + SMPOS="$2" + shift # past argument + shift # past value + ;; + smneg) + SMNEG="$2" + shift # past argument + shift # past value + ;; + smout) + SMOUT="$2" + shift # past argument + shift # past value + ;; + vmpos) + VMPOS="$2" + shift # past argument + shift # past value + ;; + vmneg) + VMNEG="$2" + shift # past argument + shift # past value + ;; + vmout) + VMOUT="$2" + shift # past argument + shift # past value + ;; + *) # unknown option + POSITIONAL+=("$1") # save it in an array for later + shift # past argument + ;; + esac +done +set -- "${POSITIONAL[@]}" # restore positional parameters +if [[ -n $1 ]]; then + echo "unexpected argument $1" + echo "arguments supplied: $@" + exit 1 +fi + +# Validate that we got the expected args +set -- ${DMPOS} ${DMNEG} ${DMOUT} ${SMPOS} ${SMNEG} ${SMOUT} ${VMPOS} ${VMNEG} ${VMOUT} +if [[ ! -n $9 ]]; then + echo "expecting nine arguments" + echo "parsed arguments: $@" + exit 1 +fi + +# Show them what we got +echo "dataMatrix positive_mode ${DMPOS}" +echo "dataMatrix negative_mode ${DMNEG}" +echo "dataMatrix joined_modes ${DMOUT}" +echo "sampleMetadata positive_mode ${SMPOS}" +echo "sampleMetadata negative_mode ${SMNEG}" +echo "sampleMetadata joined_modes ${SMOUT}" +echo "variableMetadata positive_mode ${VMPOS}" +echo "variableMetadata negative_mode ${VMNEG}" +echo "variableMetadata joined_modes ${VMOUT}" + +# Check that sample names are the same, in the same order, for the dataMatrix in both datasets +if [ "$( head -n 1 ${DMPOS} )" != "$( head -n 1 ${DMNEG} )" ]; then echo sample names in dataMatrix files differ; exit 1; fi +# Check that sample names are the same, in the same order, for the sampleMetadata in both datasets +if [ "$( cut -f 1 ${SMPOS} )" != "$( cut -f 1 ${SMNEG} )" ]; then echo sample names in sampleMetadata files differ; exit 1; fi + +# Concatenate variableMetadata datasets to respective output file +cat <( head -n 1 ${VMNEG} ) <( sed -n -e '1 d; s/^/N/; p;' ${VMNEG} ) <( sed -n -e '1 d; s/^/P/; p;' ${VMPOS} ) > ${VMOUT} + +# Concatenate dataMatrix datasets to respective output file +cat <( head -n 1 ${DMNEG} ) <( sed -n -e '1 d; s/^/N/; p;' ${DMNEG} ) <( sed -n -e '1 d; s/^/P/; p;' ${DMPOS} ) > ${DMOUT} + +# Determine whether negative ionization-mode sampleMetadata file's column three is titled "polarity" + +# find the ordinal number of the first column named "polarity" of the negative ionization-mode sampleMetadata file, if any +set -- `head -n 1 ${SMNEG}` +POLARITY=0 +MAXCOUNT=0 +while [[ $# -gt 0 ]]; do + MAXCOUNT=$(( MAXCOUNT + 1 )) + key="$1" + case $key in + polarity) + if [ $POLARITY -eq 0 ]; then POLARITY=${MAXCOUNT}; fi + shift # past argument + ;; + *) # unknown option + shift # past argument + ;; + esac +done +echo "Polarity is in column $POLARITY of ${SMNEG}" +echo "There are $MAXCOUNT columns in ${SMNEG}" + +# Copy sampleMetadata from negative ionization-mode to output file, replacing polarity if possible +if [ ${POLARITY} -gt 1 ]; then + COLBEFORE=$(( POLARITY - 1 )) + COLAFTER=$(( POLARITY + 1 )) + # Replace all entries in column three of negative ionization-mode sampleMetadata file with "posneg" in respective output file + if [ ${POLARITY} -lt ${MAXCOUNT} ]; then + # Handle the case where polarity is not in the last column + paste <( cut -f 1-${COLBEFORE} ${SMNEG} ) <( cut -f ${POLARITY} ${SMNEG} | sed -n -e '2,$ s/.*/posneg/; p;' ) <( cut -f ${COLAFTER}- ${SMNEG} ) > ${SMOUT} + else + # Handle the case where polarity is in the last column + paste <( cut -f 1-${COLBEFORE} ${SMNEG} ) <( cut -f ${POLARITY} ${SMNEG} | sed -n -e '2,$ s/.*/posneg/; p;' ) > ${SMOUT} + fi +else + # Handle the case where polarity was not found: Copy negative ionization-mode sampleMetadata file to the respective output file + cp ${SMNEG} ${SMOUT} +fi + +exit 0
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/w4mjoinpn.xml Sun Oct 29 10:05:05 2017 -0400 @@ -0,0 +1,143 @@ +<tool id="w4mjoinpn" name="Join +/- Ions" version="0.98.1"> + <description>Join positive and negative ionization-mode W4M datasets for the same samples</description> + <requirements> + <requirement type="package" version="8.25">coreutils</requirement> + <requirement type="package" version="4.2.3.dev0">sed</requirement> + </requirements> + <stdio> + <exit_code range="1:" level="fatal" /> + </stdio> + <command><![CDATA[ + echo "These are the the paths to the tools used by this script:" 1>&2 ; + which cut sed head paste cat cp bash test 1>&2 ; + $__tool_directory__/w4mjoinpn.sh + dmneg $dmneg + dmpos $dmpos + dmout $dmout + smneg $smneg + smpos $smpos + smout $smout + vmneg $vmneg + vmpos $vmpos + vmout $vmout + ]]></command> + <inputs> + <param name="dmpos" label="Data matrix positive" type="data" format="tabular" help="Positive ionization-mode: Features x samples (tabular data - decimal: '.'; missing: NA; mode: numerical; separator: tab character)" /> + <param name="smpos" label="Sample metadata positive" type="data" format="tabular" help="Positive ionization-mode: Samples x metadata (tabular data - decimal: '.'; missing: NA; mode: character or numerical; separator: tab character)" /> + <param name="vmpos" label="Variable metadata positive" type="data" format="tabular" help="Positive ionization-mode: Features x metadata (tabular data - decimal: '.'; missing: NA; mode: character or numerical; separator: tab character)" /> + <param name="dmneg" label="Data matrix negative" type="data" format="tabular" help="Negative ionization-mode: Features x samples (tabular data - decimal: '.'; missing: NA; mode: numerical; separator: tab character)" /> + <param name="smneg" label="Sample metadata negative" type="data" format="tabular" help="Negative ionization-mode: Samples x metadata (tabular data - decimal: '.'; missing: NA; mode: character or numerical; separator: tab character)" /> + <param name="vmneg" label="Variable metadata negative" type="data" format="tabular" help="Negative ionization-mode: Features x metadata (tabular data - decimal: '.'; missing: NA; mode: character or numerical; separator: tab character)" /> + </inputs> + <outputs> + <data name="dmout" label="${dmneg.name}.posneg" format="tabular" ></data> + <data name="smout" label="${smneg.name}.posneg" format="tabular" ></data> + <data name="vmout" label="${vmneg.name}.posneg" format="tabular" ></data> + </outputs> + <help><![CDATA[ +**Join positive and negative ionization-mode W4M datasets for the same samples** +-------------------------------------------------------------------------------- + +**Author** - Arthur Eschenlauer (University of Minnesota, esch0041@umn.edu) + + +Motivation +---------- + +Workflow4Metabolomics (W4M, Giacomoni *et al.*, 2014; http://workflow4metabolomics.org; https://github.com/workflow4metabolomics) +provides a suite of Galaxy tools for processing and analyzing metabolomics data. + +W4M uses the XCMS package (Smith *et al.*, 2006) to extract features and align +their retention times among multiple samples. + +After peak extraction and alignment, W4M uses the CAMERA package (Kuhl *et al.*, 2012) +"to postprocess XCMS feature lists and to collect all features related to a compound into a compound spectrum." + +Both of these steps are done using data collected in a single ionization mode (i.e., only negative or only positive) +because it would not make sense to attempt to use CAMERA otherwise. + +However, performing and interpreting statistical analysis would be more convenient and statistically powerful +with all variables (features), negative and positive, combined for one analysis. + + +Description +----------- + +This tool joins two sets of MS1 datasets for **exactly** the same set of samples, where one was gathered in positive ionization +mode and the other in negative ionization-mode, for reasons set forth above. These datasets must be post-XCMS and post-CAMERA. + +This tool will fail: + +* when the samples are not listed in exactly the same order in the negative-mode dataMatrix and the positive-mode dataMatrix +* when the samples are not listed in exactly the same order in the negative-mode sampleMetadata and the positive-mode sampleMetadata + +Otherwise: + +* the two dataMatrix files are concatenated, and the names of features identified from positive ionization-mode data are prefixed with "P"; negative, with "N". +* the two variableMetadata files are concatenated, and the names of features are prefixed in the same way. +* if sampleMetadata has a polarity column, its value is set to "posneg" in the output. (In fact, the sampleMetadata file in the output is copied from the negative ionization-mode sampleMetadata, with the polarity replaced.) + +Workflow Position +----------------- + +* Upstream tool category: Preprocessing +* Downstream tool categories: Normalisation, Statistical Analysis, Quality Control, Filter and Sort + +Working example +--------------- + +**Input files** + + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Input File | Download from URL | + +=============================================+=======================================================================================================================+ + | Data matrix, negative ionization-mode | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/input_dataMatrix_neg.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Sample metadata, negative ionization-mode | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/input_sampleMetadata_neg.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Variable metadata, negative ionization-mode | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/input_variableMetadata_neg.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Data matrix, positive ionization-mode | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/input_dataMatrix_pos.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Sample metadata, positive ionization-mode | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/input_sampleMetadata_pos.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Variable metadata, positive ionization-mode | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/input_variableMetadata_pos.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + +**Output files** + + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Output File | Download from URL | + +=============================================+=======================================================================================================================+ + | Data matrix | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/output_dataMatrix.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Sample metadata | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/output_sampleMetadata.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + | Variable metadata | https://raw.githubusercontent.com/HegemanLab/w4mjoinpn_galaxy_wrapper/master/test-data/output_variableMetadata.tsv | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+ + + ]]></help> + <citations> + <citation type="doi">10.5281/zenodo.1038289</citation> + <!-- + <citation type="bibtex"> + @misc{ + w4mjoinpn_galaxy_wrapper, + author = {Eschenlauer, Arthur}, + year = {2017}, + title = {w4mjoinpn_galaxy_wrapper}, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/HegemanLab/w4mjoinpn_galaxy_wrapper}, + doi = 10.5281/zenodo.1038290 + } + </citation> + --> + <!-- Giacomoni, 2014 Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics --> + <citation type="doi">10.1093/bioinformatics/btu813</citation> + <!-- Kuhl et al., 2012 --> + <citation type="doi">10.1021/ac202450g</citation> + <!-- Smith, 2006 XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. --> + <citation type="doi">10.1021/ac051437y</citation> + </citations> +</tool>