view heatmap_colormanipulation/heatmap_extra_v2beta_2.xml @ 4:1b362d998797 draft

Tried to fix output tag in tool test (possibly why "missing tool tests")
author mir-bioinf
date Fri, 24 Apr 2015 07:47:03 -0400
parents 746b5af0a9a2
children 77e753fa27dd
line wrap: on
line source

<tool id="heatmap_extra_v2_2" name="Heatmap with extra color options" version="1.0.1">
  <description>based on R's heatmap.2 function.</description>
  <requirements>
	<requirement type="package" version="3.0.0">R</requirement>
	<requirement type="package" version="2.12.1">gplots</requirement>
  <requirements>
  <command>R --quiet --slave --file=heatmap_extra_v2beta_VERSION.R --args $input $rowvar.rowcorr $rowvar.rowlink $colvar.colcorr $colvar.collink $var_cols $scale $na_remove $header $rowheader $grad_style $col_min $col_max $out_file1 $main@$xlab@$ylab@ ZZZZ_END $ColorManip_outer.ColorManip
	#if $ColorManip_outer.ColorManip=="InnerClip" or $ColorManip_outer.ColorManip=="OuterClip":
		$ColorManip_outer.clipValLow
		$ColorManip_outer.clipValHigh
		1>NUL 2>$err_out
	#else:
		$ColorManip_outer.clipVal 1>NUL 2>$err_out
	#end if
  </command>
  <inputs>
    <param name="main" type="text" value="title" size="30" label="Plot Title" help="Must not be blank or contain some special characters (^,!,?, and * are ok. Parentheses, brackets, @, %, and ' are not) and be under 45 characters in length, or else the title will not fit on the plot. Christy will work allowing more characters here."/>
    <conditional name="rowvar">
    	<param name="what" type="select" label="Select whether rows are genes or samples.">
		<option value="genes">Genes</option>
		<option value="samples">Samples</option>
    	</param>
	<when value="genes">
	   <param name="rowcorr" type="select" label="Distance metric for row clustering" help="Default is recommended method for gene clustering. If No clustering is selected, linkage type below will be ignored.">
        	<option value="pearson" selected="true">Pearson (1-r)</option>
        	<option value="spearman">Spearman rank (1-rho)</option>
        	<option value="euclidean">Euclidean</option>
        	<option value="none">No clustering</option>
    	   </param>
    	   <param name="rowlink" type="select" label="Type of linkage for row clustering" help="Default is recommendation to use for most cases. See below for more information on linkage types.">
        	<option value="average" selected="true">Average</option>
        	<option value="complete">Complete</option>
        	<option value="single">Single</option>
    	   </param>
	</when>
	<when value="samples">
	   <param name="rowcorr" type="select" label="Distance metric for row clustering" help="Default is recommended method for sample clustering. If No clustering is selected, linkage type below will be ignored.">
		<option value="spearman" selected="true">Spearman rank (1-rho)</option>
		<option value="pearson">Pearson (1-r)</option>
		<option value="euclidean">Euclidean</option>
        	<option value="none">No clustering</option>
	   </param>
	   <param name="rowlink" type="select" label="Type of linkage for row clustering" help="Default is recommendation to use for most cases. See below for more information on linkage types.">
		<option value="average" selected="true">Average</option>
		<option value="complete">Complete</option>
		<option value="single">Single</option>
	   </param>
	</when>
    </conditional>
    <conditional name="colvar">
        <param name="colwhat" type="select" label="Select whether columns are genes or samples.">
                <option value="samples">Samples</option>
                <option value="genes">Genes</option>
        </param>
        <when value="genes">
           <param name="colcorr" type="select" label="Distance metric for column clustering" help="Default is recommended method for gene clustering.">
                <option value="pearson" selected="true">Pearson (1-r)</option>
                <option value="spearman">Spearman rank (1-rho)</option>
                <option value="euclidean">Euclidean</option>
        	<option value="none">No clustering</option>
           </param>
           <param name="collink" type="select" label="Type of linkage for column clustering" help="Default is recommendation to use for most cases. See below for more information on linkage types.">
                <option value="average" selected="true">Average</option>
                <option value="complete">Complete</option>
                <option value="single">Single</option>
           </param>
        </when>
        <when value="samples">
           <param name="colcorr" type="select" label="Distance metric for column clustering" help="Default is recommended method for sample clustering.">
                <option value="spearman" selected="true">Spearman rank (1-rho)</option>
                <option value="pearson">Pearson (1-r)</option>
                <option value="euclidean">Euclidean</option>
        	<option value="none">No clustering</option>
           </param>
           <param name="collink" type="select" label="Type of linkage for column clustering" help="Default is recommendation to use for most cases. See below for more information on linkage types.">
                <option value="average" selected="true">Average</option>
                <option value="complete">Complete</option>
                <option value="single">Single</option>
           </param>
        </when>
    </conditional>
    
    <param name="xlab" type="text" value="x" size="30" label="Label for x axis" help="Cannot include characters ', @, %, or parentheses"/>
    <param name="ylab" type="text" value="y" size="30" label="Label for y axis" help="Cannot include characters ', @, %, or parentheses"/>
    <param name="input" type="data" format="tabular" label="Dataset"/>
    <param name="var_cols" label="Select columns containing input variables " type="data_column" data_ref="input" numerical="True" multiple="true" >
        <validator type="no_options" message="Please select at least one column."/>
    </param>
    <param name="scale" type="select" label="Center and Scale variables?">
        <option value="none" selected="true">No</option>
        <option value="column">Yes, by column</option>
        <option value="row">Yes, by row</option>
    </param>
    
    <param name="na_remove" type="select" label="Remove NA?">
        <option value="yes" selected="true">Yes</option>
        <option value="no">No</option>
    </param>
    
    <param name="header" type="select" label="Treat first line as header?" help="If header starts with #, it will NOT be read, so this field should be set to no. Otherwise it can be set to yes if first line is header.">
        <option value="yes" selected="true">Yes</option>
        <option value="no">No</option>
    </param>
    <param name="rowheader" type="select" label="Treat first column as row names?" help="If your row names should match the first column in your data, this should be set to yes.">
	<option value="yes" selected="true">Yes</option>
	<option value="no">No</option>
    </param>
    <param name="grad_style" type="select" label="Gradient Style" help="Double - color 1 to black to color 2; Single - color 1 to color 2">
	<option value="double" selected="true">Double</option>
	<option value="single">Single</option>
    </param>
    <param name="col_min" type="select" label="Color at the smallest value" help="If your smallest value is negative, we recommend blue. If your smallest value is zero, we recommend black.">
        <option value="4">Blue</option>
        <option value="1">Black</option>
        <option value="2">Red</option>
        <option value="3">Green</option>
        <option value="5">Cyan</option>
        <option value="6">Magenta</option>
        <option value="7">Yellow</option>
        <option value="8">Gray</option>
     </param>
     <param name="col_max" type="select" label="Color at the largest value">
        <option value="2">Red</option>
        <option value="7">Yellow</option>
        <option value="1">Black</option>
        <option value="3">Green</option>
        <option value="4">Blue</option>
        <option value="5">Cyan</option>
        <option value="6">Magenta</option>
        <option value="8">Gray</option>
     </param>
     <conditional name="ColorManip_outer">
	<param name="ColorManip" type="select" label="Select type of color manipulation for heatmap display" help="This choice should depend on the distribution of data values. To view the distribution, run Calculate summary statistics on entire dataset tool. For more info on the options, see help section below.">
        	<option value="InnerClip" selected="true">Choose inner colors' clip points</option>
              	<option value="ClipMin">Clip color at min value point only</option>
		<option value="ClipMax">Clip color at max value point only</option>
		<option value="OuterClip">Clip colors at max and min points</option>
	</param>
       <when value="InnerClip">
     		<param name="clipValLow" size="4" type="float" value="0.0" label="Stop low color gradient at value" help="Color gradient will go from data's minimum to this value" />
            	<param name="clipValHigh" size="4" type="float" value="0.0" label="Start high color gradient at value" help="Color gradient will go from this value to data's maximum" />
       </when>
       <when value="OuterClip">
       	  	<param name="clipValLow" size="4" type="float" value="0.0" label="Min value to clip low color" help="Low color gradient will stop after this value; values less than this will be the same color." />
             	<param name="clipValHigh" size="4" type="float" value="1.0" label="Max value to clip high color" help="High color gradient will stop after this value; values greater than this will be the same color." />
       </when>
       <when value="ClipMin">
            	<param name="clipVal" size="4" type="float" value="0.0" label="Min value to clip low color" help="Low color gradient will stop after this value; values less than this will be the same color. High color gradient will stop at the maximum value in the data." />
       </when>
       <when value="ClipMax">
           	<param name="clipVal" size="4" type="float" value="0.0" label="Max value to clip high color" help="High color gradient will stop after this value; values greater than this will be the same color. Low color gradient will stop at the minimum value in the data." />
       </when>
     </conditional>
  </inputs>
  <outputs>
    <data format="pdf" name="out_file1" label="Heatmap"/>
    <data format="txt" name="err_out" label="error_out"/>
  </outputs>
<tests>
  <test>
    <param name="main" value="Test Heatmap"/>
    <param name="rowcorr" value="pearson"/>
    <param name="rowlink" value="average"/>
    <param name="colcorr" value="spearman"/>
    <param name="collink" value="average"/>
    <param name="xlab" value="Sample name"/>
    <param name="ylab" value="Gene symbol"/>
    <param name="input" value="heatmap_extracolors_in1.tab" ftype="tabular"/>
    <param name="var_cols" value="2,3,4,5,6"/>
    <param name="scale" value="none"/>
    <param name="na_remove" value="yes"/>
    <param name="header" value="yes"/>
    <param name="row_header" value="yes"/>
    <param name="grad_style" value="double"/>
    <param name="col_min" value="4"/>
    <param name="col_max" value="2"/>
    <param name="ColorManip" value="InnerClip"/>
    <param name="clipValLow" value="0.0"/>
    <param name="clipValHigh" value="0.0"/>
    <output name="out_file1" file="heatmap_extracolors_out1.pdf"/>
    <output name="err_out" file="heatmap_err_out1.txt"/>
  </test>
</tests>
<help>

.. class:: infomark

*What it does**

This tool uses the 'heatmap.2' function from R statistical package to draw heatmap using numeric data values contained in columns of a dataset. Euclidean distances and Complete linking is equivalent to using the basic Heatmap tool. This tool adds configurability for row and column clustering in terms of distance measures and linking method. The recommended clustering and linkage methods are set as defaults, assuming rows are genes and columns are samples. For more information on linkage types in general, see below.


*R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.*

-----

.. class:: warningmark

If any rows have zero deviation (all the same value), the Pearson correlation will be NA, and the heatmap output will be a red error dataset.

If "Remove NA" option is not set to "yes", this tool skips entire rows/columns with non-numeric data

-----

**Color Manipulation Options**

*No color manipulation - leave all default (0)*

*Choose inner colors' clip points (best for bimodal dataset value distributions)*: Color mapped to lowest data value (low color) will take on a gradient for which the darkest hue is mapped to the minimum value in the dataset (determined automatically), and the lightest hue is mapped to the specified value input for "Stop low color gradient at value" prompt. Likewise, the color mapped to the highest data value (high color) will take on a gradient for which the darkest hue is mapped to the specified value for "Start high color gradient at value", and the lightest hue is mapped to the maximum value in the dataset (automatically determined).
Example bimodal dataset (values close to 0 or 1, nothing between)::

	sample	S1	S2	S3	S4	S5	S6
	S1	1	0.08	0.06	0.05	0.08	0.09
	S2	0.08	1	1	1	0.97	1
	S3	0.06	1	1	1	0.97	1
	S4	0.05	1	1	1	0.98	1
	S5	0.08	0.97	0.97	0.98	1	0.97
	S6	0.09	1	1	1	0.97	1

Example display values that can be chosen for the above dataset to visualize the subtle differences are 0.1 and 0.95 for the "Stop low color" and "Start high color" input prompt values, respectively. If this is confusingly worded, please let Christy know!


*Clip color at min value point only (best for outliers at the low end of the dataset)*: Color mapped to lowest data value (low color) will take on a gradient for which the darkest hue is mapped to the specified "Min value to clip low color". The color transition will occur halfway between this minimum value and the maximum value in the dataset (automatically determined), and the color mapped to the highest data value (high color) will take on a gradient from the aforementioned halfway point (darkest hue) up to the maximum value in the dataset (lightest hue). Example dataset for which this is a good visualization choice (some outliers AT THE LOW END ONLY but most of the remaining data is close together)::

	GeneID	log2_FC(S2/S1)	log2_FC(S3/S1)	log2_FC(S4/S1)	log2_FC(S5/S1)	log2_FC(S6/S1))
	ASNS	-1093.001	1.824679717	1.575430565	0.970889	2.104598893
	BEST1	3.341922966	3.25087179	3.961852285	3.429484142	3.717432789
	BHLHE41	-1.936238732	2.145753785	2.44525769	-1000.123	2.07475321
	C8orf46	4.334222947	-4.30902017	3.981405448	3.161135243	4.251538767
	CCDC64	2.516662746	2.540500932	3.842305595	4.617812421	2.365768433

A good display value for the above dataset to visualize the differences in lower magnitude values without the -1000-range values dominating the color scheme is a "Min value to clip low color" of -5.


*Clip color at max value point only (best for outliers at the high end of the dataset)*: Color mapped to the highest data value (high color) will take on a gradient for which the darkest hue is mapped from the minimum data value (determined automatically) to the value halfway between the minimum and the chosen "Max value to clip high color". The high color will continue from the halfway point to the specified max value. All values above this will have the same color. Example dataset for which this is a good visualization choice (some outliers AT THE HIGH END ONLY but most of the remaining data is close together)::

        GeneID  log2_FC(S2/S1)  log2_FC(S3/S1)  log2_FC(S4/S1)  log2_FC(S5/S1)  log2_FC(S6/S1))
        ASNS    1093.001        1.824679717     1.575430565     0.970889        2.104598893
        BEST1   3.341922966     3.25087179      3.961852285     3.429484142     3.717432789
        BHLHE41 -1.936238732    2.145753785     2.44525769      1000.123       	2.07475321
        C8orf46 4.334222947     -4.30902017     3.981405448     3.161135243     4.251538767
        CCDC64  2.516662746     2.540500932     3.842305595     4.617812421     2.365768433

A good display value for the above dataset to visualize the differences in lower magnitude values without the +1000-range values dominating the color scheme is a "Max value to clip high color" of 5.


*Clip colors at max and min points (best for outliers at both ends of the dataset)*: This scheme is a combination of the previous two visualization schemes. It is best used when a dataset has outliers at both high and low ends of the value distribution, such as the following example::

        GeneID  log2_FC(S2/S1)  log2_FC(S3/S1)  log2_FC(S4/S1)  log2_FC(S5/S1)  log2_FC(S6/S1))
        ASNS    1093.001        1.824679717     1.575430565     0.970889        2.104598893
        BEST1   -2000.111     	3.25087179      3.961852285     3.429484142     3.717432789
        BHLHE41 -1.936238732    2.145753785     2.44525769      1000.123        2.07475321
        C8orf46 4.334222947     -4.30902017     3.981405448     3.161135243     4.251538767
        CCDC64  2.516662746     2.540500932     -12345.6     	4.617812421     2.365768433

Good max and min clip values to display the above data are 5 and -5, respectively.



**Linkage Types**

*Average linkage:* the distance between clusters is defined as the average distance between all members of one cluster and all members of another cluster (default method, good to use for most cases).

*Complete linkage:* the distance between clusters is defined as the maximum distance between members of one cluster and members of another cluster.

*Single linkage:* the distance between clusters is defined as the minimum distance between the members of one culster and members of another cluster.


</help>
</tool>