# HG changeset patch # User mir-bioinf # Date 1429558013 14400 # Node ID 3797463c65f84fc49305f4996fcfedcb7afcb0d2 Initial upload diff -r 000000000000 -r 3797463c65f8 heatmap_colormanipulation/.ruby-version --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/heatmap_colormanipulation/.ruby-version Mon Apr 20 15:26:53 2015 -0400 @@ -0,0 +1,1 @@ +2.0.0-dev diff -r 000000000000 -r 3797463c65f8 heatmap_colormanipulation/heatmap_extra_v2beta_2.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/heatmap_colormanipulation/heatmap_extra_v2beta_2.xml Mon Apr 20 15:26:53 2015 -0400 @@ -0,0 +1,267 @@ + + based on R's heatmap.2 function. + R --quiet --slave --file=heatmap_extra_v2beta_VERSION.R --args $input $rowvar.rowcorr $rowvar.rowlink $colvar.colcorr $colvar.collink $var_cols $scale $na_remove $header $rowheader $grad_style $col_min $col_max $out_file1 $main@$xlab@$ylab@ ZZZZ_END $ColorManip_outer.ColorManip + #if $ColorManip_outer.ColorManip=="InnerClip" or $ColorManip_outer.ColorManip=="OuterClip": + $ColorManip_outer.clipValLow + $ColorManip_outer.clipValHigh + 1>NUL 2>$err_out + #else: + $ColorManip_outer.clipVal 1>NUL 2>$err_out + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +.. class:: infomark + +*What it does** + +This tool uses the 'heatmap.2' function from R statistical package to draw heatmap using numeric data values contained in columns of a dataset. Euclidean distances and Complete linking is equivalent to using the basic Heatmap tool. This tool adds configurability for row and column clustering in terms of distance measures and linking method. The recommended clustering and linkage methods are set as defaults, assuming rows are genes and columns are samples. For more information on linkage types in general, see below. + + +*R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.* + +----- + +.. class:: warningmark + +If any rows have zero deviation (all the same value), the Pearson correlation will be NA, and the heatmap output will be a red error dataset. + +If "Remove NA" option is not set to "yes", this tool skips entire rows/columns with non-numeric data + +----- + +**Color Manipulation Options** + +*No color manipulation - leave all default (0)* + +*Choose inner colors' clip points (best for bimodal dataset value distributions)*: Color mapped to lowest data value (low color) will take on a gradient for which the darkest hue is mapped to the minimum value in the dataset (determined automatically), and the lightest hue is mapped to the specified value input for "Stop low color gradient at value" prompt. Likewise, the color mapped to the highest data value (high color) will take on a gradient for which the darkest hue is mapped to the specified value for "Start high color gradient at value", and the lightest hue is mapped to the maximum value in the dataset (automatically determined). +Example bimodal dataset (values close to 0 or 1, nothing between):: + + sample S1 S2 S3 S4 S5 S6 + S1 1 0.08 0.06 0.05 0.08 0.09 + S2 0.08 1 1 1 0.97 1 + S3 0.06 1 1 1 0.97 1 + S4 0.05 1 1 1 0.98 1 + S5 0.08 0.97 0.97 0.98 1 0.97 + S6 0.09 1 1 1 0.97 1 + +Example display values that can be chosen for the above dataset to visualize the subtle differences are 0.1 and 0.95 for the "Stop low color" and "Start high color" input prompt values, respectively. If this is confusingly worded, please let Christy know! + + +*Clip color at min value point only (best for outliers at the low end of the dataset)*: Color mapped to lowest data value (low color) will take on a gradient for which the darkest hue is mapped to the specified "Min value to clip low color". The color transition will occur halfway between this minimum value and the maximum value in the dataset (automatically determined), and the color mapped to the highest data value (high color) will take on a gradient from the aforementioned halfway point (darkest hue) up to the maximum value in the dataset (lightest hue). Example dataset for which this is a good visualization choice (some outliers AT THE LOW END ONLY but most of the remaining data is close together):: + + GeneID log2_FC(S2/S1) log2_FC(S3/S1) log2_FC(S4/S1) log2_FC(S5/S1) log2_FC(S6/S1)) + ASNS -1093.001 1.824679717 1.575430565 0.970889 2.104598893 + BEST1 3.341922966 3.25087179 3.961852285 3.429484142 3.717432789 + BHLHE41 -1.936238732 2.145753785 2.44525769 -1000.123 2.07475321 + C8orf46 4.334222947 -4.30902017 3.981405448 3.161135243 4.251538767 + CCDC64 2.516662746 2.540500932 3.842305595 4.617812421 2.365768433 + +A good display value for the above dataset to visualize the differences in lower magnitude values without the -1000-range values dominating the color scheme is a "Min value to clip low color" of -5. + + +*Clip color at max value point only (best for outliers at the high end of the dataset)*: Color mapped to the highest data value (high color) will take on a gradient for which the darkest hue is mapped from the minimum data value (determined automatically) to the value halfway between the minimum and the chosen "Max value to clip high color". The high color will continue from the halfway point to the specified max value. All values above this will have the same color. Example dataset for which this is a good visualization choice (some outliers AT THE HIGH END ONLY but most of the remaining data is close together):: + + GeneID log2_FC(S2/S1) log2_FC(S3/S1) log2_FC(S4/S1) log2_FC(S5/S1) log2_FC(S6/S1)) + ASNS 1093.001 1.824679717 1.575430565 0.970889 2.104598893 + BEST1 3.341922966 3.25087179 3.961852285 3.429484142 3.717432789 + BHLHE41 -1.936238732 2.145753785 2.44525769 1000.123 2.07475321 + C8orf46 4.334222947 -4.30902017 3.981405448 3.161135243 4.251538767 + CCDC64 2.516662746 2.540500932 3.842305595 4.617812421 2.365768433 + +A good display value for the above dataset to visualize the differences in lower magnitude values without the +1000-range values dominating the color scheme is a "Max value to clip high color" of 5. + + +*Clip colors at max and min points (best for outliers at both ends of the dataset)*: This scheme is a combination of the previous two visualization schemes. It is best used when a dataset has outliers at both high and low ends of the value distribution, such as the following example:: + + GeneID log2_FC(S2/S1) log2_FC(S3/S1) log2_FC(S4/S1) log2_FC(S5/S1) log2_FC(S6/S1)) + ASNS 1093.001 1.824679717 1.575430565 0.970889 2.104598893 + BEST1 -2000.111 3.25087179 3.961852285 3.429484142 3.717432789 + BHLHE41 -1.936238732 2.145753785 2.44525769 1000.123 2.07475321 + C8orf46 4.334222947 -4.30902017 3.981405448 3.161135243 4.251538767 + CCDC64 2.516662746 2.540500932 -12345.6 4.617812421 2.365768433 + +Good max and min clip values to display the above data are 5 and -5, respectively. + + + +**Linkage Types** + +*Average linkage:* the distance between clusters is defined as the average distance between all members of one cluster and all members of another cluster (default method, good to use for most cases). + +*Complete linkage:* the distance between clusters is defined as the maximum distance between members of one cluster and members of another cluster. + +*Single linkage:* the distance between clusters is defined as the minimum distance between the members of one culster and members of another cluster. + + + + + diff -r 000000000000 -r 3797463c65f8 heatmap_colormanipulation/heatmap_extra_v2beta_VERSION.R --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/heatmap_colormanipulation/heatmap_extra_v2beta_VERSION.R Mon Apr 20 15:26:53 2015 -0400 @@ -0,0 +1,244 @@ +sink(file="/tmp/none") +sink("/dev/null") +options(warn=-1) +options(echo=F) + +args <- commandArgs(trailingOnly = T) +#title <- args[17] +Rowcorr <- args[2] +Rowlink <- args[3] +Colcorr <- args[4] +Collink <- args[5] +#Xlab <- args[18] +#Ylab <- args[19] +inputfile <- args[1] +Var_cols <- args[6] +Scale_var <- args[7] +Remove_na <- args[8] +header_yes <- args[9] +rowhead_yes <- args[10] +color_grad <- args[11] +color_min <- args[12] +#max_val_mincol <- args[13] +color_max <- args[13] +#min_val_maxcol <- args[15] +out_file <- args[14] +#logfile <- args[15] + +##Now for title, xlabel, and ylabel (spaces are hard to deal with here): +#title <- args[17] +#Xlab <- args[18] +#Ylab <- args[19] + +stoptime = 0 +argIndex = 16 +everything = args[argIndex] + +debugcounter = 0 + +suppressMessages(library(gplots)) +Rinfo = sessionInfo() +Rinfo_pkg = sessionInfo(package="gplots") +gplots_info = Rinfo_pkg$otherPkgs +#sink(logfile) +Rinfo +gplots_info +sink(file="/tmp/none") +sink("/dev/null") + +#cat(paste("arg value is ",args[argIndex],".\n"),file=logfile,append="TRUE") + +#while (stoptime < 1){ +while ((stoptime < 1)&&(debugcounter<50)){ + argIndex=argIndex+1 +# cat(paste("in while loop now, arg index is ",argIndex,".\n"),file=logfile,append="TRUE") +# cat(paste("arg value is ",args[argIndex],".\n"),file=logfile,append="TRUE") + everything = paste(everything,args[argIndex]) + if (args[argIndex]=="ZZZZ_END") { + stoptime = 1 + } + debugcounter=debugcounter+1 +} + +argIndex=argIndex+1 +#cat(paste("Out of while loop. arg index value is now ",argIndex,".\n"),file=logfile,append="TRUE") + +splitThese = strsplit(everything,"[@]") +title = splitThese[[1]][1] +Xlab = splitThese[[1]][2] +Ylab = splitThese[[1]][3] + +##Now grab the rest of the arguments passed in: +colorManip = args[argIndex] +argIndex = argIndex+1 + +#cat(paste("Color manip value is ",colorManip,".\n"),file=logfile,append="TRUE") + +if ((colorManip == "InnerClip") || (colorManip == "OuterClip")) { + LowClipVal = as.numeric(args[argIndex]) + HighClipVal = as.numeric(args[argIndex+1]) +# cat(paste("Two vals to clip: ",LowClipVal,HighClipVal,".\n"),file=logfile,append="TRUE") + +} else { + ClipVal = as.numeric(args[argIndex]) +# cat(paste("One val to clip: ",ClipVal,".\n"),file=logfile,append="TRUE") +} + +if (header_yes == "yes") { + inp = read.table(inputfile,stringsAsFactors=F, header=T, sep="\t") +} else { + inp = read.table(inputfile,stringsAsFactors=F, sep="\t") +} + + +these_cols = read.csv(text=Var_cols,header=F) + +if (ncol(these_cols)<2) { + x = data.frame(cbind(inp[, c(as.matrix(these_cols))],inp[, c(as.matrix(these_cols))])) + currentColNames=colnames(x) + labColVar = c(currentColNames[1],"") +} else { + x = inp[, c(as.matrix(these_cols))] + labColVar = colnames(x) +} + + +genemat = do.call(cbind,x) +x = apply(genemat,2,as.numeric) + +scale_value = Scale_var +na_rm_value = FALSE + +if (Remove_na == "yes") { + na_rm_value = TRUE +} + + +if (rowhead_yes == "yes") { + rownames(x)=inp[[1]] +} + +pdf(out_file) + + +if ((Rowcorr=="none") && (Colcorr!="none")) { + dendro_val = "column" +} else if ((Rowcorr!="none") && (Colcorr=="none")) { + dendro_val = "row" +} + +if ((Rowcorr=="none") && (Colcorr=="none")) { + dendro_val = "none" +} + +if ((Rowcorr!="none") && (Colcorr!="none")) { + dendro_val = "both" +} + +if (Rowcorr == "none") { + Rowv_val = FALSE +} else { + Rcor = cor(t(x),method=Rowcorr) + R_clust = hclust(as.dist(1-Rcor),method=Rowlink) + R_dendro = as.dendrogram(R_clust) + Rowv_val = R_dendro +} + +##Column clustering (if any) set up: +if (Colcorr == "none") { + Colv_val = FALSE +} else { + Ccor = cor(x,method=Colcorr) + C_clust = hclust(as.dist(1-Ccor),method=Collink) + C_dendro = as.dendrogram(C_clust) + Colv_val = C_dendro +} + +par(cex.main=0.8) ##font size for title +##Estimate good guesses for font sizes of rows and columns: +font_r1 = 0.2 + 1/log10(nrow(x)) ##default done in heatmap, based on number of rows +font_size_r = min(0.8,font_r1) + +font_c1 = 0.2 + 1/log10(ncol(x)) ##default done in heatmap, based on number of columns +font_size_c = min(0.8,font_c1) + +#min_value = min(x) ##x should be the original data matrix +#max_value = max(x) + +if (colorManip == "InnerClip") { + min_value = min(x) + max_value = max(x) + max_val_mincol = LowClipVal + min_val_maxcol = HighClipVal + +} else if (colorManip == "OuterClip") { + min_value = LowClipVal + max_value = HighClipVal + ##How do we set the other values if 0 isn't included in the range? Probably want central color to be center value: + if ((min_value<=0)&&(max_value>=0)) { + max_val_mincol = 0 ##will be reset later to account for slight tolerance (so black is included) + min_val_maxcol = 0 + } else { + ##0 is not in range, center around halfway point + max_val_mincol = (min_value+max_value)/2 + 0.00005 + min_val_maxcol = (min_value+max_value)/2 - 0.00005 + } +} else if (colorManip == "ClipMax") { + min_value = min(x) + max_value = ClipVal + if ((min_value<=0)&&(max_value>=0)) { + max_val_mincol = 0 ##will be reset later to account for slight tolerance (so black is included) + min_val_maxcol = 0 + } else { + ##0 is not in range, center around halfway point + max_val_mincol = (min_value+max_value)/2 + 0.00005 + min_val_maxcol = (min_value+max_value)/2 - 0.00005 + } +} else { + min_value = ClipVal + max_value = max(x) + if ((min_value<=0)&&(max_value>=0)) { + max_val_mincol = 0 ##will be reset later to account for slight tolerance (so black is included) + min_val_maxcol = 0 + } else { + ##0 is not in range, center around halfway point + max_val_mincol = (min_value+max_value)/2 + 0.00005 + min_val_maxcol = (min_value+max_value)/2 - 0.00005 + } +} + + +##is 0 included in the data range? if so we want it centered +if ((min_value <= 0) && (max_value >=0)) { + sym_breaks_value = "TRUE" + sym_key_value = "TRUE" +} else { + sym_breaks_value = "FALSE" + sym_key_value = "FALSE" +} + +if (color_grad == "double") { + my_palette = colorRampPalette(c(color_min,"black",color_max))(n=299) +} else { + my_palette = colorRampPalette(c(color_min,color_max))(n=299) +} + +##Need some tolerance otherwise black won't be included for 0 +if ((max_val_mincol==0) && (min_val_maxcol==0)) { + max_val_mincol = -0.00005 + min_val_maxcol = 0.00005 +} + +#cat(paste("max_val_mincol value is ",max_val_mincol,".\n"),file=logfile,append="TRUE") +#cat(paste("min_val_maxcol value is ",min_val_maxcol,".\n"),file=logfile,append="TRUE") + +colors = c(seq(min_value,max_val_mincol,length=100),seq(max_val_mincol,min_val_maxcol,length=100),seq(min_val_maxcol,max_value,length=100)) + +##Call heatmap.2. cexCol value is constant to account for long sample names +heatmap.2(x, margins=c(9,10), main=title, xlab=Xlab, ylab=Ylab, cexCol=font_size_c, cexRow=font_size_r, scale=scale_value, symbreaks=sym_breaks_value, symm=F, symkey=sym_key_value, na.rm=na_rm_value, trace="none", col=my_palette, breaks=colors, dendrogram=dendro_val, Rowv=Rowv_val, Colv=Colv_val, labCol=labColVar) + + +## Close the PDF file +devname = dev.off() + + diff -r 000000000000 -r 3797463c65f8 heatmap_colormanipulation/test-data/heatmap_extracolors_in1.tab --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/heatmap_colormanipulation/test-data/heatmap_extracolors_in1.tab Mon Apr 20 15:26:53 2015 -0400 @@ -0,0 +1,51 @@ + MES_R1 MES_R2 CP_R1 CP_R2 CM_R1 CM_R2 +9930023K05Rik 1.04 3.24 0.01 2.06 2.52 1.72 +A4galt 64.07 64.13 56.77 46.42 9.47 14.77 +Ace 5.21 7.56 8.11 10.28 58.68 77.41 +Actr3 94.84 76.57 85.83 74.3 16.12 99.82 +Actr3b 34.33 43.16 44.62 24.68 98.26 64.69 +Adam34 0.01 0 0 0 0 0 +Adnp 8.17 0.36 27.6 21.28 22.32 72.76 +Amac1 2.08 5.4 2.02 1.03 3.35 4.3 +Asb3 0.77 44.11 58.32 11.61 88.08 64.64 +Asf1a 57.33 26.32 32.35 27.59 88.08 57.44 +AU021092 0 0 3.04 6.17 1.67 7.73 +Bahcc1 50.46 67.97 80.31 78.86 30.8 84.1 +Bcl2a1d 0 0 1 0 1 0 +Ccdc40 95.58 34.39 14.58 49.98 62.8 58.42 +Chst9 0 0 1.01 1.03 0 0 +Cphx 1.04 0 3.04 0 0 0 +Cyp26a1 75.25 44.44 15.9 89.6 2.52 0.86 +Dcaf4 57.32 55.04 46.66 0.98 20.44 22.59 +Dnajc13 31.99 31.77 56.95 34.65 36.6 77.76 +Dusp2 80.1 83.17 99.08 26.02 41.19 52.09 +Elf4 45.78 40.01 1.98 69 55.22 52.68 +Fdxacb1 43.56 55.54 20.66 12.1 56.21 84.21 +Ganab 12.71 15.48 81.09 32.53 27.24 57.27 +Gm7616 1.17 1.15 0.71 2.13 4.51 2.78 +Grlf1 1.4 49.15 36.68 95.45 56.72 85.49 +Grwd1 77.35 3.83 32.4 1.09 25.6 44.04 +Il1rapl2 0 0.08 0 0 0 0 +Ippk 90.45 92.1 47.79 31.16 65.81 21.68 +Luc7l 89.97 20.02 76.71 68.3 89.65 73.67 +Med1 66.69 94.87 46.13 32.17 63.8 66.91 +Mlf2 30.27 79.06 20.31 86.77 25.24 10.88 +Mrgprb8 0 0 0.01 0 0 0 +Ncoa1 57.05 45.13 55.27 51.49 24.64 12.05 +Ndc80 5.52 24.72 39.69 94.19 69.5 47.13 +Nkpd1 16.64 28.09 55.78 51.42 88.74 75.61 +Nr2c2 78.94 1.63 90.58 11.88 36.04 85.89 +Pif1 81.99 92.94 51.59 70.76 6.01 54.13 +Raet1c 0.98 0 0 0 0.53 0 +Rnf7 55.49 30.38 16.26 10.74 90.98 40.75 +Rom1 32.12 52.3 4.44 24.44 44.04 40.53 +Rtf1 1.17 15.78 25.88 11.4 64.28 33.68 +Sec16b 4.16 1.08 2.02 5.15 18.42 12.88 +Slamf9 0 0 3.04 1.03 4.19 4.3 +Suv39h1 7.65 71.36 7.38 24.62 13.75 1.01 +Sys1 74.12 70.77 27.66 4.72 33.22 46.68 +Tmem159 90.51 81.01 96.32 100.79 80.15 88.17 +Tubb6 79.94 69.93 40.96 85.84 50.38 78.96 +Ube2i 0.74 26.57 71.35 75.41 92.69 107.4 +Wdr54 9.23 103.7 42.97 102.84 13.05 27.94 +Wipf2 25.14 28.88 57.58 32.75 56.6 24.07 diff -r 000000000000 -r 3797463c65f8 heatmap_colormanipulation/tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/heatmap_colormanipulation/tool_dependencies.xml Mon Apr 20 15:26:53 2015 -0400 @@ -0,0 +1,18 @@ + + + + + + + + + + + + + https://depot.galaxyproject.org/package/noarch/gtools_3.4.1.tar.gz + + + + +