Galaxy |

Changeset 3:d1a0b7ded7e3 (2019-11-22)

Previous changeset 2:9e8788803adc (2019-11-22) Next changeset 4:b14e4bf568b0 (2019-11-25)

Commit message:
Uploaded

added:
aurora_wgcna_trait.Rmd

diff -r 9e8788803adc -r d1a0b7ded7e3 aurora_wgcna_trait.Rmd
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/aurora_wgcna_trait.Rmd Fri Nov 22 19:45:51 2019 -0500

[

b'@@ -0,0 +1,237 @@\n+---\n+title: \'Aurora Galaxy WGCNA Tool: Gene Co-Expression Network Construction & Analysis. Part 2\'\n+output:\n+ pdf_document:\n+ number_sections: false\n+---\n+\n+```{r setup, include=FALSE, warning=FALSE, message=FALSE}\n+knitr::opts_chunk$set(error = FALSE, echo = FALSE)\n+```\n+```{r}\n+# Load the data from the previous step.\n+load(file=opt$r_data)\n+```\n+# Introduction\n+This report is part two of step-by-step results from use of the [Aurora Galaxy](https://github.com/statonlab/aurora-galaxy-tools) Weighted Gene Co-expression Network Analysis [WGCNA](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559) tool. It is generated when trait or phenotype data is provided.\n+\n+This report was generated on:\n+```{r}\n+format(Sys.time(), "%a %b %d %X %Y")\n+```\n+\n+# Trait/Phenotype Data\n+\n+The contents below show the first 10 rows and 6 columns of trait/phenotype data provided. However, any columns that were indicated should be removed, were removed and any categorical columns specified were converte to a one-hot enconding (e.g. 0 when present 1 when not present). The updated trait/phenotype data matrix has been saved into a comma-separated file named `updated_trait_matrix.csv`.\n+\n+```{r}\n+# Load the trait data file.\n+trait_data = data.frame()\n+trait_data = read.csv(opt$trait_data, header = TRUE, row.names = opt$sname_col, na.strings = opt$missing_value2)\n+sample_names = rownames(gemt)\n+trait_rows = match(sample_names, rownames(trait_data))\n+trait_data = trait_data[trait_rows, ]\n+\n+# Determine the column types within the trait annotation data.\n+trait_types = sapply(trait_data, class)\n+\n+# If the type is character we should convert it to a factor manually.\n+character_fields = colnames(trait_data)[which(trait_types == "character")]\n+if (length(character_fields) > 0) {\n+ for (field in character_fields) {\n+ trait_data[[field]] = as.factor(trait_data[[field]])\n+ }\n+}\n+\n+# Remove ignored columns.\n+ignore_cols = strsplit(opt$ignore_cols, \',\')[[1]]\n+if (length(ignore_cols) > 0) {\n+ print(\'You chose to ignore the following fields:\')\n+ print(ignore_cols)\n+ trait_data = trait_data[, colnames(trait_data)[!(colnames(trait_data) %in% ignore_cols)]]\n+}\n+\n+# Make sure we don\'t one-hot-encoude any columns that were also ignored.\n+one_hot_cols = strsplit(opt$one_hot_cols, \',\')[[1]]\n+one_hot_cols = one_hot_cols[which(!(one_hot_cols %in% ignore_cols))]\n+\n+# Change any categorical fields to 1 hot encoding as requested by the caller.\n+if (length(one_hot_cols) > 0) {\n+ print(\'You chose to treat the following fields as categorical:\')\n+ print(one_hot_cols)\n+\n+ # Make sure we have enough levels for 1-hot encoding. We must have at least two.\n+ hkeep = c()\n+ hignore = c()\n+ for (field in one_hot_cols[[i]]) {\n+ \n+ # Make sure the field is categorical. If it came in as integer it must be switched.\n+ if (trait_types[[field]] == "integer") {\n+ trait_data[[field]] = as.factor(trait_data[[field]])\n+ }\n+ if (trait_types[[field]] == "numeric") {\n+ print(\'The following quantitative field will be treated as numeric instead.\')\n+ print(field)\n+ next\n+ }\n+ \n+ # Now make sure we have enough factors.\n+ if (nlevels(trait_data[[field]]) > 1) {\n+ hkeep[length(hkeep)+1] = field\n+ } else {\n+ hignore[length(hignore)+1] = field\n+ }\n+ }\n+ \n+ if (length(hignore) > 0) {\n+ print(\'These fields were ignored due to too few factors:\')\n+ print(hignore)\n+ }\n+ \n+ # Perform the 1-hot encoding for specified and valid fields.\n+ if (length(hkeep) > 0) {\n+ print(\'These fields were be 1-hot encoded:\')\n+ print(hkeep)\n+ \n+ swap_cols = colnames(trait_data)[(colnames(trait_data) %in% hkeep)]\n+ temp = as.data.frame(trait_data[, swap_cols])\n+ colnames(temp) = swap_cols\n+ temp = apply(temp, 2, make.names)\n+ dmy <- dummyVars(" ~ .", data = temp)\n+ encoded <- data.frame(predict(dmy, newdata = temp))\n+ encoded = sapply(encoded, as.integer)\n+ \n+ # Ma'..b' colnames(trait_data)[colnames(trait_data) %in% colnames(trait_colors)]\n+trait_colors = trait_colors[,trait_order]\n+trait_data = trait_data[,trait_order]\n+plotSampleDendroTraits <- function() {\n+ plotDendroAndColors(sampleTree, trait_colors,\n+ groupLabels = names(trait_data),\n+ main = "Sample Dendrogram and Annotation Heatmap",\n+ cex.dendroLabels = 0.5)\n+}\n+\n+png(\'figures/07-sample_trait_dendrogram.png\', width=6 ,height=10, units="in", res=300)\n+plotSampleDendroTraits()\n+invisible(dev.off())\n+plotSampleDendroTraits()\n+```\n+\n+To statistically identify the associations, correlation tests are performed of the eigengenes of each module with the annotation data. The following heatmap shows the results between each annotation feature and each module. Modules with a signficant positive assocation have a correlation value near 1. Modules with a significant negative association have a correlation value near -1. Modules with no correlation have a value near 0.\n+\n+```{r fig.align=\'center\', fig.width=15, fig.height=15}\n+MEs = orderMEs(MEs)\n+moduleTraitCor = cor(MEs, trait_data, use = "p");\n+moduleTraitPvalue = corPvalueStudent(moduleTraitCor, n_samples);\n+\n+plotModuleTraitHeatmap <- function() {\n+ # The WGCNA labeledHeatmap function is too overloaded with detail, we\'ll create a simpler plot.\n+ plotData = melt(moduleTraitCor)\n+ # We want to makes sure the order is the same as in the\n+ # labeledHeatmap function (example above)\n+ plotData$Var1 = factor(plotData$Var1, levels = rev(colnames(MEs)), ordered=TRUE)\n+ # Now use ggplot2 to make a nicer image.\n+ p <- ggplot(plotData, aes(Var2, Var1, fill=value)) +\n+ geom_tile() + xlab(\'Experimental Conditions\') + ylab(\'WGCNA Modules\') +\n+ scale_fill_gradient2(low = "#0072B2", high = "#D55E00",\n+ mid = "white", midpoint = 0,\n+ limit = c(-1,1), name="PCC") +\n+ theme_bw() +\n+ theme(axis.text.x = element_text(angle = 45, hjust=1, vjust=1, size=15),\n+ axis.text.y = element_text(angle = 0, hjust=1, vjust=0.5, size=15),\n+ legend.text=element_text(size=15),\n+ panel.border = element_blank(),\n+ panel.grid.major = element_blank(),\n+ panel.grid.minor = element_blank(),\n+ axis.line = element_blank())\n+ print(p)\n+}\n+png(\'figures/08-module_trait_dendrogram.png\', width=12 ,height=12, units="in", res=300)\n+plotModuleTraitHeatmap()\n+invisible(dev.off())\n+plotModuleTraitHeatmap()\n+```\n+\n+```{r}\n+output = cbind(moduleTraitCor, moduleTraitPvalue)\n+write.csv(output, file = opt$module_association_file, quote=FALSE, row.names=TRUE)\n+```\n+A file has been generated named `module_association.csv` which conatins the list of modules, and their correlation values as well as p-values indicating the strength of the associations.\n+```{r}\n+# names (colors) of the modules\n+modNames = substring(names(MEs), 3)\n+geneModuleMembership = as.data.frame(cor(gemt, MEs, use = "p"));\n+MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), n_samples));\n+names(geneModuleMembership) = paste("MM", modNames, sep="");\n+names(MMPvalue) = paste("p.MM", modNames, sep="");\n+\n+# Calculate the gene trait significance as a Pearson\'s R and p-value.\n+gts = as.data.frame(cor(gemt, trait_data, use = "p"));\n+gtsp = as.data.frame(corPvalueStudent(as.matrix(gts), n_samples));\n+colnames(gtsp) = c(paste("p", names(trait_data), sep="."))\n+colnames(gts) = c(paste("GS", names(trait_data), sep="."))\n+\n+# Write out the gene information.\n+output = cbind(Module = module_labels, gts, gtsp)\n+write.csv(output, file = opt$gene_association_file, quote=FALSE, row.names=TRUE)\n+\n+```\n+Genes themselves can also have assocation with traits. This is calculated via a traditional correlation test as well. Another file has been generated named `gene_association.csv` which provides the list of genes, the modules they belong to and the assocaition of each gene to the trait features.\n'