# HG changeset patch # User ethevenot # Date 1470501762 14400 # Node ID 09799fc16bc636384620c8450cc2ddf30ce33751 # Parent fdefbc780d2edb9a66c47227db3a873676c34b06 planemo upload for repository https://github.com/workflow4metabolomics/univariate.git commit 2c0d4d97c208edca1ada2035a7b7af9c4eb31afe diff -r fdefbc780d2e -r 09799fc16bc6 README.md --- a/README.md Sat Jul 30 12:38:02 2016 -0400 +++ b/README.md Sat Aug 06 12:42:42 2016 -0400 @@ -1,13 +1,14 @@ -# Univariate parametric and non-parametric hypothesis testing with correction for multiple testing +Univariate parametric and non-parametric hypothesis testing with correction for multiple testing +================================================================================================ -A Galaxy module from the [Workflow4metabolomics](http://workflow4metabolomics.org) project. +A Galaxy module from the [Workflow4metabolomics](http://workflow4metabolomics.org) infrastructure Status: [![Build Status](https://travis-ci.org/workflow4metabolomics/univariate.svg?branch=master)](https://travis-ci.org/workflow4metabolomics/univariate). -## Description +### Description -**Version:** 2.1.2 -**Date:** 2016-07-30 +**Version:** 2.1.4 +**Date:** 2016-08-05 **Author:** Marie Tremblay-Franco (INRA, MetaToul, MetaboHUB, W4M Core Development Team) and Etienne A. Thevenot (CEA, LIST, MetaboHUB, W4M Core Development Team) **Email:** [marie.tremblay-franco(at)toulouse.inra.fr](mailto:marie.tremblay-franco@toulouse.inra.fr); [etienne.thevenot(at)cea.fr](mailto:etienne.thevenot@cea.fr) **Citation:** Thevenot E.A., Roux A., Xu Y., Ezan E. and Junot C. (2015). Analysis of the human adult urinary metabolome variations with age, body mass index and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. *Journal of Proteome Research*, **14**:3322-3335. [doi:10.1021/acs.jproteome.5b00354](http://dx.doi.org/10.1021/acs.jproteome.5b00354) @@ -15,27 +16,58 @@ **Licence:** CeCILL **Funding:** Agence Nationale de la Recherche ([MetaboHUB](http://www.metabohub.fr/index.php?lang=en&Itemid=473) national infrastructure for metabolomics and fluxomics, ANR-11-INBS-0010 grant) -## Installation +### Installation - * Configuration file: **univariate_config.xml** + * Configuration file: `univariate_config.xml` * Image file: - + **static/images/univariate_workflowPositionImage.png** - * Wrapper file: **univariate_wrapper.R** - * Script file: **univariate_script.R** + + `static/images/univariate_workflowPositionImage.png` + * Wrapper file: `univariate_wrapper.R` + * Script file: `univariate_script.R` * R packages - + **batch** from CRAN: `install.packages("batch", dep=TRUE)`. - + **PMCMR** from Bioconductor: `install.packages("PMCMR", dep=TRUE)`. + + **batch** from CRAN + + ```r + install.packages("batch", dep=TRUE) + ``` + + **PMCMR** from CRAN + + ```r + install.packages("PMCMR", dep=TRUE) + ``` + +### Tests -## Tests +The code in the wrapper can be tested by running the `runit/univariate_runtests.R` R file + +You will need to install **RUnit** package in order to make it run: +```r +install.packages('RUnit', dependencies = TRUE) +``` -The code in the wrapper can be tested by running the **tests/univariate_tests.R** in R +### Working example + +See the **W4M00001a_sacurine-subset-statistics**, **W4M00001b_sacurine-subset-complete**, **W4M00002_mtbls2**, **W4M00003_diaplasma** shared histories in the **Shared Data/Published Histories** menu (https://galaxy.workflow4metabolomics.org/history/list_published) + +### News + +###### CHANGES IN VERSION 2.1.4 + +NEW FEATURE -## News +Level names are now separated by '.' instead of '-' previously in the column names of the output variableMetadata table (e.g., 'jour_ttest_J3.J10_fdr' instead of 'jour_ttest_J3-J10_fdr' previously) + +INTERNAL MODIFICATION -## CHANGES IN VERSION 2.1.2 + * Minor internal changes + +###### CHANGES IN VERSION 2.1.2 + +INTERNAL MODIFICATION * Minor internal changes in .shed.yml for toolshed export -## CHANGES IN VERSION 2.1.1 +###### CHANGES IN VERSION 2.1.1 + +INTERNAL MODIFICATION * Internal handling of 'NA' p-values (e.g. when intensities are identical in all samples). diff -r fdefbc780d2e -r 09799fc16bc6 docker/Dockerfile --- a/docker/Dockerfile Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,28 +0,0 @@ -FROM ubuntu:14.04 - -MAINTAINER Etienne Thevenot (etienne.thevenot@cea.fr) - -# Setup package repos -RUN echo "deb http://mirrors.ebi.ac.uk/CRAN/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list -RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 - -# Update and upgrade system -RUN apt-get update -RUN apt-get -y upgrade - -# Install R and other needed packages -RUN apt-get -y install r-base -RUN R -e "install.packages('batch', lib='/usr/lib/R/library', dependencies = TRUE, repos='http://mirrors.ebi.ac.uk/CRAN')" - -# Clone tool -RUN apt-get -y install git -RUN git clone -b docker https://github.com/workflow4metabolomics/univariate /files/univariate - -# Put univariate folder into PATH -ENV PATH=$PATH:/files/univariate - -# Clean up -RUN apt-get clean && apt-get autoremove -y && rm -rf /var/lib/{apt,dpkg,cache,log}/ /tmp/* /var/tmp/* - -# Define Entry point script -ENTRYPOINT ["/files/univariate/univariate_wrapper.R"] diff -r fdefbc780d2e -r 09799fc16bc6 runit/example1/dataMatrix.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/example1/dataMatrix.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,4 @@ +dataMatrix Ech10 Ech11 Ech12 Ech13 Ech14 Ech15 +MT1 3.439956551 3.399847335 3.335401704 3.4201777 3.24585851 3.401256321 +MT2 5.008458405 4.461291924 4.068043169 4.42768414 4.406640829 4.500370048 +MT3 3.99527636 4.051758488 4.332552332 4.348474118 4.253679544 4.26823853 diff -r fdefbc780d2e -r 09799fc16bc6 runit/example1/sampleMetadata.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/example1/sampleMetadata.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,7 @@ +sampleMetadata jour +Ech10 J10 +Ech11 J10 +Ech12 J10 +Ech13 J3 +Ech14 J3 +Ech15 J3 diff -r fdefbc780d2e -r 09799fc16bc6 runit/example1/variableMetadata.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/example1/variableMetadata.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,4 @@ +variableMetadata PCA_XLOAD-h1 PCA_XLOAD-h2 +MT1 -0.048723936 0.05648187 +MT2 -0.067609139 0.084300327 +MT3 0.080335733 -0.0215397 diff -r fdefbc780d2e -r 09799fc16bc6 runit/input/dataMatrix.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/input/dataMatrix.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,4 @@ +profile HU_017 HU_021 HU_027 HU_032 HU_041 HU_048 HU_049 HU_050 HU_052 HU_059 HU_060 HU_066 HU_072 HU_077 HU_090 HU_109 HU_110 HU_125 HU_126 HU_131 HU_134 HU_149 HU_150 HU_173 HU_179 HU_180 HU_182 HU_202 HU_204 HU_209 +HMDB01032 2569204.92420381 6222035.77434915 17070707.9912636 1258838.24348419 13039543.0754619 1909391.77026598 3495.09386434063 2293521.90928998 128503.275117713 81872.5276382213 8103557.56578035 149574887.036181 1544036.41049333 7103429.53933206 14138796.50382 4970265.57952158 263054.73056162 1671332.30008058 88433.1944958815 23602331.2894815 18648126.5206986 1554657.98756878 34152.3646391152 209372.71275317 33187733.370626 202438.591636003 13581070.0886437 354170.810678102 9120781.48986975 43419175.4051586 +HMDB03072 3628416.30251025 65626.9834353751 112170.118946651 3261804.34422417 42228.2787747563 343254.201250707 1958217.69317664 11983270.0435677 5932111.41638028 5511385.83359531 9154521.47755199 2632133.21209418 9500411.14556502 6551644.51726592 7204319.80891836 1273412.04795188 3260583.81592376 8932005.5351622 8340827.52597275 9256460.69197759 11217839.169041 5919262.81433556 11790077.0657915 9567977.80797097 73717.5811684739 9991787.29074293 4208098.14739633 623970.649925847 10904221.2642849 2171793.93621067 +HMDB00792 429568.609438384 3887629.50527037 1330692.11658995 1367446.73023821 844197.447472453 2948090.71886592 1614157.90566884 3740009.19379795 3292251.66531919 2310688.79492013 4404239.59008605 3043289.12780863 825736.467181043 2523241.91730649 6030501.02648005 474901.604069803 2885792.42617652 2955990.64049134 1917716.3427982 1767962.67737699 5926203.40397675 1639065.69474684 346810.763557826 1054776.22313737 2390258.27543894 1831346.37315857 1026696.36904362 7079792.50047866 4368341.01359769 3495986.87280275 diff -r fdefbc780d2e -r 09799fc16bc6 runit/input/sampleMetadata.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/input/sampleMetadata.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,31 @@ +sample age ageGroup +HU_017 41 experienced +HU_021 34 junior +HU_027 37 experienced +HU_032 38 experienced +HU_041 28 junior +HU_048 39 experienced +HU_049 50 senior +HU_050 30 junior +HU_052 51 senior +HU_059 81 senior +HU_060 55 senior +HU_066 25 junior +HU_072 47 experienced +HU_077 27 junior +HU_090 46 experienced +HU_109 32 junior +HU_110 50 senior +HU_125 58 senior +HU_126 45 experienced +HU_131 42 experienced +HU_134 48 experienced +HU_149 35 experienced +HU_150 49 experienced +HU_173 55 senior +HU_179 33 junior +HU_180 53 senior +HU_182 43 experienced +HU_202 42 experienced +HU_204 31 junior +HU_209 17.5 junior diff -r fdefbc780d2e -r 09799fc16bc6 runit/input/variableMetadata.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/input/variableMetadata.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,4 @@ +variable name +HMDB01032 Dehydroepiandrosterone sulfate +HMDB03072 Quinic acid +HMDB00792 Sebacic acid diff -r fdefbc780d2e -r 09799fc16bc6 runit/output/information.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/output/information.txt Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,9 @@ + +Start of the 'Univariate' Galaxy module call: Sat 06 Aug 2016 06:22:18 PM + +Performing 'kruskal' + +The following 1 variable (33%) was found significant at the 0.05 level: +HMDB01032 + +End of 'Univariate' Galaxy module call: 2016-08-06 18:22:18 diff -r fdefbc780d2e -r 09799fc16bc6 runit/output/variableMetadata.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/output/variableMetadata.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,4 @@ +variableMetadata name ageGroup_kruskal_fdr ageGroup_kruskal_sig ageGroup_kruskal_junior.experienced_dif ageGroup_kruskal_senior.experienced_dif ageGroup_kruskal_senior.junior_dif ageGroup_kruskal_junior.experienced_pva ageGroup_kruskal_senior.experienced_pva ageGroup_kruskal_senior.junior_pva ageGroup_kruskal_junior.experienced_sig ageGroup_kruskal_senior.experienced_sig ageGroup_kruskal_senior.junior_sig +HMDB01032 Dehydroepiandrosterone sulfate 0.0117826825222329 1 7211389.71960377 -1703486.11807139 -8914875.83767516 0.204550960009346 0.123124593762726 0.00251932966039092 0 0 1 +HMDB03072 Quinic acid 0.461634758626427 0 -3747468.87812489 1512795.66143568 5260264.53956057 NA NA NA NA NA NA +HMDB00792 Sebacic acid 0.469555338459932 0 1404223.43306179 959174.915801485 -445048.517260305 NA NA NA NA NA NA diff -r fdefbc780d2e -r 09799fc16bc6 runit/univariate_runtests.R --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/univariate_runtests.R Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,102 @@ +#!/usr/bin/env Rscript + +## Package +##-------- + +library(RUnit) + +## Constants +##---------- + +testOutDirC <- "output" +argVc <- commandArgs(trailingOnly = FALSE) +scriptPathC <- sub("--file=", "", argVc[grep("--file=", argVc)]) + + +## Functions +##----------- + +## Reading tables (matrix or data frame) +readTableF <- function(fileC, typeC = c("matrix", "dataframe")[1]) { + + file.exists(fileC) || stop(paste0("No output file \"", fileC ,"\".")) + + switch(typeC, + matrix = return(t(as.matrix(read.table(file = fileC, + header = TRUE, + row.names = 1, + sep = "\t", + stringsAsFactors = FALSE)))), + dataframe = return(read.table(file = fileC, + header = TRUE, + row.names = 1, + sep = "\t", + stringsAsFactors = FALSE))) + +} + +## Call wrapper +wrapperCallF <- function(paramLs) { + + ## Set program path + wrapperPathC <- file.path(dirname(scriptPathC), "..", "univariate_wrapper.R") + + ## Set arguments + argLs <- NULL + for (parC in names(paramLs)) + argLs <- c(argLs, parC, paramLs[[parC]]) + + ## Call + wrapperCallC <- paste(c(wrapperPathC, argLs), collapse = " ") + + if(.Platform$OS.type == "windows") + wrapperCallC <- paste("Rscript", wrapperCallC) + + wrapperCodeN <- system(wrapperCallC) + + if (wrapperCodeN != 0) + stop("Error when running univariate_wrapper.R.") + + ## Get output + outLs <- list() + if ("dataMatrix_out" %in% names(paramLs)) + outLs[["datMN"]] <- readTableF(paramLs[["dataMatrix_out"]], "matrix") + if ("sampleMetadata_out" %in% names(paramLs)) + outLs[["samDF"]] <- readTableF(paramLs[["sampleMetadata_out"]], "dataframe") + if ("variableMetadata_out" %in% names(paramLs)) + outLs[["varDF"]] <- readTableF(paramLs[["variableMetadata_out"]], "dataframe") + if("information" %in% names(paramLs)) + outLs[["infVc"]] <- readLines(paramLs[["information"]]) + + return(outLs) +} + +## Setting default parameters +defaultArgF <- function(testInDirC) { + + defaultArgLs <- list() + if(file.exists(file.path(dirname(scriptPathC), testInDirC, "dataMatrix.tsv"))) + defaultArgLs[["dataMatrix_in"]] <- file.path(dirname(scriptPathC), testInDirC, "dataMatrix.tsv") + if(file.exists(file.path(dirname(scriptPathC), testInDirC, "sampleMetadata.tsv"))) + defaultArgLs[["sampleMetadata_in"]] <- file.path(dirname(scriptPathC), testInDirC, "sampleMetadata.tsv") + if(file.exists(file.path(dirname(scriptPathC), testInDirC, "variableMetadata.tsv"))) + defaultArgLs[["variableMetadata_in"]] <- file.path(dirname(scriptPathC), testInDirC, "variableMetadata.tsv") + + defaultArgLs[["variableMetadata_out"]] <- file.path(dirname(scriptPathC), testOutDirC, "variableMetadata.tsv") + defaultArgLs[["information"]] <- file.path(dirname(scriptPathC), testOutDirC, "information.txt") + + defaultArgLs + +} + +## Main +##----- + +## Create output folder +file.exists(testOutDirC) || dir.create(testOutDirC) + +## Run tests +test.suite <- defineTestSuite('tests', dirname(scriptPathC), testFileRegexp = paste0('^.*_tests\\.R$'), testFuncRegexp = '^.*$') +isValidTestSuite(test.suite) +test.results <- runTestSuite(test.suite) +print(test.results) diff -r fdefbc780d2e -r 09799fc16bc6 runit/univariate_tests.R --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/runit/univariate_tests.R Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,44 @@ +test_input_kruskal <- function() { + + testDirC <- "input" + argLs <- list(facC = "ageGroup", + tesC = "kruskal", + adjC = "fdr", + thrN = "0.05") + + argLs <- c(defaultArgF(testDirC), argLs) + outLs <- wrapperCallF(argLs) + + checkEqualsNumeric(outLs[["varDF"]]["HMDB01032", "ageGroup_kruskal_senior.experienced_pva"], 0.1231246, tolerance = 1e-6) + +} + +test_example1_wilcoxDif <- function() { + + testDirC <- "example1" + argLs <- list(facC = "jour", + tesC = "wilcoxon", + adjC = "fdr", + thrN = "0.05") + + argLs <- c(defaultArgF(testDirC), argLs) + outLs <- wrapperCallF(argLs) + + checkEqualsNumeric(outLs[["varDF"]]["MT3", "jour_wilcoxon_J3.J10_dif"], 0.216480042, tolerance = 1e-8) + +} + +test_example1_ttestFdr <- function() { + + testDirC <- "example1" + argLs <- list(facC = "jour", + tesC = "ttest", + adjC = "fdr", + thrN = "0.05") + + argLs <- c(defaultArgF(testDirC), argLs) + outLs <- wrapperCallF(argLs) + + checkEqualsNumeric(outLs[["varDF"]]["MT3", "jour_ttest_J3.J10_fdr"], 0.7605966, tolerance = 1e-6) + +} diff -r fdefbc780d2e -r 09799fc16bc6 test-data/output-variableMetadata.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/output-variableMetadata.tsv Sat Aug 06 12:42:42 2016 -0400 @@ -0,0 +1,4 @@ +variableMetadata name ageGroup_kruskal_fdr ageGroup_kruskal_sig ageGroup_kruskal_junior.experienced_dif ageGroup_kruskal_senior.experienced_dif ageGroup_kruskal_senior.junior_dif ageGroup_kruskal_junior.experienced_pva ageGroup_kruskal_senior.experienced_pva ageGroup_kruskal_senior.junior_pva ageGroup_kruskal_junior.experienced_sig ageGroup_kruskal_senior.experienced_sig ageGroup_kruskal_senior.junior_sig +HMDB01032 Dehydroepiandrosterone sulfate 0.0117826825222329 1 7211389.71960377 -1703486.11807139 -8914875.83767516 0.204550960009346 0.123124593762726 0.00251932966039092 0 0 1 +HMDB03072 Quinic acid 0.461634758626427 0 -3747468.87812489 1512795.66143568 5260264.53956057 NA NA NA NA NA NA +HMDB00792 Sebacic acid 0.469555338459932 0 1404223.43306179 959174.915801485 -445048.517260305 NA NA NA NA NA NA diff -r fdefbc780d2e -r 09799fc16bc6 test-data/variableMetadata-output.tsv --- a/test-data/variableMetadata-output.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,4 +0,0 @@ -variableMetadata name ageGroup_kruskal_fdr ageGroup_kruskal_sig ageGroup_kruskal_junior-experienced_dif ageGroup_kruskal_senior-experienced_dif ageGroup_kruskal_senior-junior_dif ageGroup_kruskal_junior-experienced_pva ageGroup_kruskal_senior-experienced_pva ageGroup_kruskal_senior-junior_pva ageGroup_kruskal_junior-experienced_sig ageGroup_kruskal_senior-experienced_sig ageGroup_kruskal_senior-junior_sig -HMDB01032 Dehydroepiandrosterone sulfate 0.0117826825222329 1 7211389.71960377 -1703486.11807139 -8914875.83767516 0.204550960009346 0.123124593762726 0.00251932966039092 0 0 1 -HMDB03072 Quinic acid 0.461634758626427 0 -3747468.87812489 1512795.66143568 5260264.53956057 NA NA NA NA NA NA -HMDB00792 Sebacic acid 0.469555338459932 0 1404223.43306179 959174.915801485 -445048.517260305 NA NA NA NA NA NA diff -r fdefbc780d2e -r 09799fc16bc6 tests/example1/dataMatrix.tsv --- a/tests/example1/dataMatrix.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,4 +0,0 @@ -dataMatrix Ech10 Ech11 Ech12 Ech13 Ech14 Ech15 -MT1 3.439956551 3.399847335 3.335401704 3.4201777 3.24585851 3.401256321 -MT2 5.008458405 4.461291924 4.068043169 4.42768414 4.406640829 4.500370048 -MT3 3.99527636 4.051758488 4.332552332 4.348474118 4.253679544 4.26823853 diff -r fdefbc780d2e -r 09799fc16bc6 tests/example1/sampleMetadata.tsv --- a/tests/example1/sampleMetadata.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,7 +0,0 @@ -sampleMetadata jour -Ech10 J10 -Ech11 J10 -Ech12 J10 -Ech13 J3 -Ech14 J3 -Ech15 J3 diff -r fdefbc780d2e -r 09799fc16bc6 tests/example1/variableMetadata.tsv --- a/tests/example1/variableMetadata.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,4 +0,0 @@ -variableMetadata PCA_XLOAD-h1 PCA_XLOAD-h2 -MT1 -0.048723936 0.05648187 -MT2 -0.067609139 0.084300327 -MT3 0.080335733 -0.0215397 diff -r fdefbc780d2e -r 09799fc16bc6 tests/input/dataMatrix.tsv --- a/tests/input/dataMatrix.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,4 +0,0 @@ -profile HU_017 HU_021 HU_027 HU_032 HU_041 HU_048 HU_049 HU_050 HU_052 HU_059 HU_060 HU_066 HU_072 HU_077 HU_090 HU_109 HU_110 HU_125 HU_126 HU_131 HU_134 HU_149 HU_150 HU_173 HU_179 HU_180 HU_182 HU_202 HU_204 HU_209 -HMDB01032 2569204.92420381 6222035.77434915 17070707.9912636 1258838.24348419 13039543.0754619 1909391.77026598 3495.09386434063 2293521.90928998 128503.275117713 81872.5276382213 8103557.56578035 149574887.036181 1544036.41049333 7103429.53933206 14138796.50382 4970265.57952158 263054.73056162 1671332.30008058 88433.1944958815 23602331.2894815 18648126.5206986 1554657.98756878 34152.3646391152 209372.71275317 33187733.370626 202438.591636003 13581070.0886437 354170.810678102 9120781.48986975 43419175.4051586 -HMDB03072 3628416.30251025 65626.9834353751 112170.118946651 3261804.34422417 42228.2787747563 343254.201250707 1958217.69317664 11983270.0435677 5932111.41638028 5511385.83359531 9154521.47755199 2632133.21209418 9500411.14556502 6551644.51726592 7204319.80891836 1273412.04795188 3260583.81592376 8932005.5351622 8340827.52597275 9256460.69197759 11217839.169041 5919262.81433556 11790077.0657915 9567977.80797097 73717.5811684739 9991787.29074293 4208098.14739633 623970.649925847 10904221.2642849 2171793.93621067 -HMDB00792 429568.609438384 3887629.50527037 1330692.11658995 1367446.73023821 844197.447472453 2948090.71886592 1614157.90566884 3740009.19379795 3292251.66531919 2310688.79492013 4404239.59008605 3043289.12780863 825736.467181043 2523241.91730649 6030501.02648005 474901.604069803 2885792.42617652 2955990.64049134 1917716.3427982 1767962.67737699 5926203.40397675 1639065.69474684 346810.763557826 1054776.22313737 2390258.27543894 1831346.37315857 1026696.36904362 7079792.50047866 4368341.01359769 3495986.87280275 diff -r fdefbc780d2e -r 09799fc16bc6 tests/input/sampleMetadata.tsv --- a/tests/input/sampleMetadata.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,31 +0,0 @@ -sample age ageGroup -HU_017 41 experienced -HU_021 34 junior -HU_027 37 experienced -HU_032 38 experienced -HU_041 28 junior -HU_048 39 experienced -HU_049 50 senior -HU_050 30 junior -HU_052 51 senior -HU_059 81 senior -HU_060 55 senior -HU_066 25 junior -HU_072 47 experienced -HU_077 27 junior -HU_090 46 experienced -HU_109 32 junior -HU_110 50 senior -HU_125 58 senior -HU_126 45 experienced -HU_131 42 experienced -HU_134 48 experienced -HU_149 35 experienced -HU_150 49 experienced -HU_173 55 senior -HU_179 33 junior -HU_180 53 senior -HU_182 43 experienced -HU_202 42 experienced -HU_204 31 junior -HU_209 17.5 junior diff -r fdefbc780d2e -r 09799fc16bc6 tests/input/variableMetadata.tsv --- a/tests/input/variableMetadata.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,4 +0,0 @@ -variable name -HMDB01032 Dehydroepiandrosterone sulfate -HMDB03072 Quinic acid -HMDB00792 Sebacic acid diff -r fdefbc780d2e -r 09799fc16bc6 tests/output/information.txt --- a/tests/output/information.txt Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,8 +0,0 @@ - -Start of the 'Univariate' Galaxy module call: sam. 21 mai 2016 20:25:53 - -Performing 'ttest' - -No significant variable found at the selected 0.05 level - -End of 'Univariate' Galaxy module call: 2016-05-21 20:25:53 diff -r fdefbc780d2e -r 09799fc16bc6 tests/output/variableMetadata.tsv --- a/tests/output/variableMetadata.tsv Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,4 +0,0 @@ -variableMetadata PCA_XLOAD-h1 PCA_XLOAD-h2 jour_ttest_J3-J10_dif jour_ttest_J3-J10_fdr jour_ttest_J3-J10_sig -MT1 -0.048723936 0.05648187 -0.0359710196666665 0.827558403950534 0 -MT2 -0.067609139 0.084300327 -0.0676994936666668 0.827558403950534 0 -MT3 0.080335733 -0.0215397 0.163601670666666 0.760596565270778 0 diff -r fdefbc780d2e -r 09799fc16bc6 tests/univariate_tests.R --- a/tests/univariate_tests.R Sat Jul 30 12:38:02 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,194 +0,0 @@ -library(RUnit) - -wrapperF <- function(argVc) { - - - source("../univariate_script.R") - - -#### Start_of_testing_code <- function() {} - - - ##------------------------------ - ## Initializing - ##------------------------------ - - ## options - ##-------- - - strAsFacL <- options()$stringsAsFactors - options(stringsAsFactors = FALSE) - - ## packages - ##--------- - - library(PMCMR) - - ## constants - ##---------- - - modNamC <- "Univariate" ## module name - - topEnvC <- environment() - flagC <- "\n" - - ## functions - ##---------- - - flgF <- function(tesC, - envC = topEnvC, - txtC = NA) { ## management of warning and error messages - - tesL <- eval(parse(text = tesC), envir = envC) - - if(!tesL) { - - sink(NULL) - stpTxtC <- ifelse(is.na(txtC), - paste0(tesC, " is FALSE"), - txtC) - - stop(stpTxtC, - call. = FALSE) - - } - - } ## flgF - - ## log file - ##--------- - - sink(argVc["information"]) - - cat("\nStart of the '", modNamC, "' Galaxy module call: ", - format(Sys.time(), "%a %d %b %Y %X"), "\n", sep="") - - ## loading - ##-------- - - datMN <- t(as.matrix(read.table(argVc["dataMatrix_in"], - check.names = FALSE, - header = TRUE, - row.names = 1, - sep = "\t"))) - - samDF <- read.table(argVc["sampleMetadata_in"], - check.names = FALSE, - header = TRUE, - row.names = 1, - sep = "\t") - - varDF <- read.table(argVc["variableMetadata_in"], - check.names = FALSE, - header = TRUE, - row.names = 1, - sep = "\t") - - tesC <- argVc["tesC"] - - ## checking - ##--------- - - flgF("identical(rownames(datMN), rownames(samDF))", txtC = "Column names of the dataMatrix are not identical to the row names of the sampleMetadata; check your data with the 'Check Format' module in the 'Quality Control' section") - flgF("identical(colnames(datMN), rownames(varDF))", txtC = "Row names of the dataMatrix are not identical to the row names of the variableMetadata; check your data with the 'Check Format' module in the 'Quality Control' section") - - flgF("argVc['facC'] %in% colnames(samDF)", txtC = paste0("Required factor of interest '", argVc['facC'], "' could not be found in the column names of the sampleMetadata")) - flgF("mode(samDF[, argVc['facC']]) %in% c('character', 'numeric')", txtC = paste0("The '", argVc['facC'], "' column of the sampleMetadata should contain either number only, or character only")) - - flgF("!(tesC %in% c('ttest', 'wilcoxon')) || (mode(samDF[, argVc['facC']]) == 'character' && length(unique(samDF[, argVc['facC']])) == 2)", txtC = paste0("For 'ttest' and 'wilcoxon', the chosen factor column ('", argVc['facC'], "') of the sampleMetadata should contain characters with only two different classes")) - flgF("!(tesC %in% c('anova', 'kruskal')) || (mode(samDF[, argVc['facC']]) == 'character' && length(unique(samDF[, argVc['facC']])) > 2)", txtC = paste0("For 'anova' and 'kruskal', the chosen factor column ('", argVc['facC'], "') of the sampleMetadata should contain characters with at least three different classes")) - flgF("!(tesC %in% c('pearson', 'spearman')) || mode(samDF[, argVc['facC']]) == 'numeric'", txtC = paste0("For 'pearson' and 'spearman', the chosen factor column ('", argVc['facC'], "') of the sampleMetadata should contain numbers only")) - - flgF("argVc['adjC'] %in% c('holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr', 'none')") - - flgF("0 <= as.numeric(argVc['thrN']) && as.numeric(argVc['thrN']) <= 1", - txtC = "(corrected) p-value threshold must be between 0 and 1") - - - ##------------------------------ - ## Computation - ##------------------------------ - - - varDF <- univariateF(datMN = datMN, - samDF = samDF, - varDF = varDF, - facC = argVc["facC"], - tesC = tesC, - adjC = argVc["adjC"], - thrN = as.numeric(argVc["thrN"])) - - - ##------------------------------ - ## Ending - ##------------------------------ - - - ## saving - ##-------- - - varDF <- cbind.data.frame(variableMetadata = rownames(varDF), - varDF) - write.table(varDF, - file = argVc["variableMetadata_out"], - quote = FALSE, - row.names = FALSE, - sep = "\t") - - ## closing - ##-------- - - cat("\nEnd of '", modNamC, "' Galaxy module call: ", - as.character(Sys.time()), "\n", sep = "") - - sink() - - options(stringsAsFactors = strAsFacL) - - - -#### End_of_testing_code <- function() {} - - - return(list(varDF = varDF)) - - rm(list = ls()) - -} - -exaDirOutC <- "output" -if(!file.exists(exaDirOutC)) - stop("Please create an 'output' subfolder into the (current) 'tests' folder") - -tesArgLs <- list(input_kruskal = c(facC = "ageGroup", - tesC = "kruskal", - adjC = "fdr", - thrN = "0.05", - .chkC = "checkEqualsNumeric(outLs[['varDF']]['HMDB01032', 'ageGroup_kruskal_senior-experienced_pva'], 0.1231246, tolerance = 1e-6)"), - example1_wilcoxDif = c(facC = "jour", - tesC = "wilcoxon", - adjC = "fdr", - thrN = "0.05", - .chkC = "checkEqualsNumeric(outLs[['varDF']]['MT3', 'jour_wilcoxon_J3-J10_dif'], 0.216480042, tolerance = 1e-8)"), - example1_ttestFdr = c(facC = "jour", - tesC = "ttest", - adjC = "fdr", - thrN = "0.05", - .chkC = "checkEqualsNumeric(outLs[['varDF']]['MT3', 'jour_ttest_J3-J10_fdr'], 0.7605966, tolerance = 1e-6)")) - -for(tesC in names(tesArgLs)) - tesArgLs[[tesC]] <- c(tesArgLs[[tesC]], - dataMatrix_in = file.path(unlist(strsplit(tesC, "_"))[1], "dataMatrix.tsv"), - sampleMetadata_in = file.path(unlist(strsplit(tesC, "_"))[1], "sampleMetadata.tsv"), - variableMetadata_in = file.path(unlist(strsplit(tesC, "_"))[1], "variableMetadata.tsv"), - variableMetadata_out = file.path(exaDirOutC, "variableMetadata.tsv"), - information = file.path(exaDirOutC, "information.txt")) - -for(tesC in names(tesArgLs)) { - print(tesC) - outLs <- wrapperF(tesArgLs[[tesC]]) - if(".chkC" %in% names(tesArgLs[[tesC]])) - stopifnot(eval(parse(text = tesArgLs[[tesC]][[".chkC"]]))) -} - -message("Checks successfully completed") diff -r fdefbc780d2e -r 09799fc16bc6 univariate_config.xml --- a/univariate_config.xml Sat Jul 30 12:38:02 2016 -0400 +++ b/univariate_config.xml Sat Aug 06 12:42:42 2016 -0400 @@ -1,71 +1,75 @@ - + Univariate statistics + + + R + r-batch + r-PMCMR + - - R - r-batch - r-pmcmr - - + + + + - - - - + + + - - - - - - + + + + + + - - - - - - - - + + + + + + + + - - + + - + - + - - - - - - - - - - + + + + + + + + + + - + .. class:: infomark @@ -200,6 +204,78 @@ Working example --------------- +.. class:: infomark + +See the **W4M00001a_sacurine-subset-statistics**, **W4M00001b_sacurine-subset-complete**, **W4M00002_mtbls2**, **W4M00003_diaplasma** shared histories in the **Shared Data/Published Histories** menu (https://galaxy.workflow4metabolomics.org/history/list_published) + +--------------------------------------------------- + +---- +NEWS +---- + +CHANGES IN VERSION 2.1.4 +======================== + +NEW FEATURE + +Level names are now separated by '.' instead of '-' previously in the column names of the output variableMetadata table (e.g., 'jour_ttest_J3.J10_fdr' instead of 'jour_ttest_J3-J10_fdr' previously) + +INTERNAL MODIFICATIONS + +Minor internal changes for toolshed export + +CHANGES IN VERSION 2.1.2 +======================== + +INTERNAL MODIFICATIONS + +Minor internal changes for toolshed export + +CHANGES IN VERSION 2.1.1 +======================== + +INTERNAL MODIFICATIONS + +Internal handling of 'NA' p-values (e.g. when intensities are identical in all samples) + +CHANGES IN VERSION 2.0.1 +======================== + +NEW FEATURE + +(corrected) p-value threshold can be set to any value between 0 and 1 + + + + + + @Manual{, + title = {R: A Language and Environment for Statistical Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2016}, + url = {https://www.R-project.org/}, + } + @Article{Thevenot2015, + Title = {Analysis of the human adult urinary metabolome variations with age, body mass index and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses}, + Author = {Thevenot, Etienne A. and Roux, Aurelie and Xu, Ying and Ezan, Eric and Junot, Christophe}, + Journal = {Journal of Proteome Research}, + Year = {2015}, + Note = {PMID: 26088811}, + Number = {8}, + Pages = {3322-3335}, + Volume = {14}, + + Doi = {10.1021/acs.jproteome.5b00354}, + Url = {http://pubs.acs.org/doi/full/10.1021/acs.jproteome.5b00354} + } + 10.1093/bioinformatics/btu813 + + + + Input files =========== @@ -291,28 +367,5 @@ --------------------------------------------------- ----- -NEWS ----- - -CHANGES IN VERSION 2.1.2 -======================== - -Minor internal changes for toolshed export - -CHANGES IN VERSION 2.1.1 -======================== - -Internal handling of 'NA' p-values (e.g. when intensities are identical in all samples) - -CHANGES IN VERSION 2.0.1 -======================== - -(corrected) p-value threshold can be set to any value between 0 and 1 - - - - - diff -r fdefbc780d2e -r 09799fc16bc6 univariate_script.R --- a/univariate_script.R Sat Jul 30 12:38:02 2016 -0400 +++ b/univariate_script.R Sat Aug 06 12:42:42 2016 -0400 @@ -54,7 +54,7 @@ sigVn <- as.numeric(fdrVn < thrN) if(tesC %in% c("ttest", "wilcoxon")) - varPfxC <- paste0(varPfxC, paste(rev(facLevVc), collapse = "-"), "_") + varPfxC <- paste0(varPfxC, paste(rev(facLevVc), collapse = "."), "_") varDF[, paste0(varPfxC, ifelse(tesC %in% c("ttest", "wilcoxon"), "dif", "cor"))] <- staVn @@ -67,6 +67,8 @@ ## getting the names of the pairwise comparisons 'class1Vclass2' prwVc <- rownames(TukeyHSD(aov(datMN[, 1] ~ facFcVn))[["facFcVn"]]) + prwVc <- gsub("-", ".", prwVc, fixed = TRUE) ## 2016-08-05: '-' character in dataframe column names seems not to be converted to "." by write.table on ubuntu R-3.3.1 + aovMN <- t(apply(datMN, 2, function(varVn) { aovMod <- aov(varVn ~ facFcVn) @@ -97,9 +99,9 @@ nemVl <- c(lower.tri(nemMN, diag = TRUE)) nemClaMC <- cbind(rownames(nemMN)[c(row(nemMN))][nemVl], colnames(nemMN)[c(col(nemMN))][nemVl]) - nemNamVc <- paste0(nemClaMC[, 1], "-", nemClaMC[, 2]) + nemNamVc <- paste0(nemClaMC[, 1], ".", nemClaMC[, 2]) nemNamVc <- paste0(varPfxC, nemNamVc) - + nemMN <- t(apply(datMN, 2, function(varVn) { pvaN <- kruskal.test(varVn ~ facFcVn)[["p.value"]] diff -r fdefbc780d2e -r 09799fc16bc6 univariate_wrapper.R --- a/univariate_wrapper.R Sat Jul 30 12:38:02 2016 -0400 +++ b/univariate_wrapper.R Sat Aug 06 12:42:42 2016 -0400 @@ -12,10 +12,6 @@ argVc <- unlist(parseCommandArgs(evaluate=FALSE)) - -#### Start_of_tested_code <- function() {} - - ##------------------------------ ## Initializing ##------------------------------ @@ -136,6 +132,7 @@ varDF <- cbind.data.frame(variableMetadata = rownames(varDF), varDF) + write.table(varDF, file = argVc["variableMetadata_out"], quote = FALSE, @@ -152,8 +149,4 @@ options(stringsAsFactors = strAsFacL) - -#### End_of_tested_code <- function() {} - - rm(list = ls())