changeset 12:be08c88b353e draft

Added Sample names input parameter to be appended to the output file names
author saharlcc
date Sun, 26 Mar 2017 22:27:30 -0400
parents 630d5a01ef13
children cff0a5f324d4
files isoem2_isode2/isoDE.xml isoem2_isode2/isoem_wrapper.xml
diffstat 2 files changed, 76 insertions(+), 38 deletions(-) [+]
line wrap: on
line diff
--- a/isoem2_isode2/isoDE.xml	Fri Mar 17 06:47:46 2017 -0400
+++ b/isoem2_isode2/isoDE.xml	Sun Mar 26 22:27:30 2017 -0400
@@ -1,5 +1,5 @@
 <tool id="isoDE" name="IsoDE2">
-  <description>Compute gene Differential Expression based on IsoEM2 output </description>
+  <description>Computes differentially expressed isoforms and genes based on bootstrap samples generated by IsoEM2 </description>
   <command interpreter="bash">isoDE2.sh
       -c1
       $condition1
@@ -19,7 +19,9 @@
  </command>
 
   <inputs>
-  <param name="condition1" type="data" label="Select data for Condition 1" format="gz" help="Condition 1 isoEM2 compressed output file"/>
+    <param name="sampleName1"  size="10" type="text"  label="Name for Condition 1" value="Condition1"/>
+
+    <param name="condition1" type="data" label="Select data for Condition 1" format="gz" help="Condition 1 isoEM2 compressed output file"/>
 <!--
     <param format="toolshed.gz" name="condition1" type="data" label="Select data for Condition 1" help="Condition 1 isoEM2 compressed output file"/>
 -->
@@ -27,6 +29,8 @@
       <param name="c1Rep" label="Add replicate" type="data" format="gz" data_ref="condtion1" />
     </repeat>
 
+    <param name="sampleName2"  size="10" type="text"  label="Name for Condition 2" value="Condition2"/>
+
     <param format="gz" name="condition2" type="data" label="Select data for Condition 2"  help="Condition 2 isoEM2 compressed output file"/>
 <!--
     <param  format="toolshed.gz" name="condition2" type="data" label="Select data for Condition 2" help="Condition 2 IsoEM2 compressed output file"/>
@@ -40,22 +44,23 @@
 
   </inputs>
   <outputs>
-    <data format="tabular" name="geneFPKM" label="isoDE gene fpkm"  />
-    <data format="tabular" name="isoformFPKM" label="isoDE isoform fpkm"  />
-    <data format="tabular" name="geneTPM" label="isoDE gene tpm"  />
-    <data format="tabular" name="isoformTPM" label="isoDE isoform tpm"  />
+    <data format="tabular" name="geneFPKM" label="${sampleName1}.vs.${sampleName2} isoDE gene fpkm "  />
+    <data format="tabular" name="isoformFPKM" label="${sampleName1}.vs.${sampleName2} isoDE isoform fpkm "  />
+    <data format="tabular" name="geneTPM" label="${sampleName1}.vs.${sampleName2} isoDE gene tpm "  />
+    <data format="tabular" name="isoformTPM" label="${sampleName1}.vs.${sampleName2} isoDE isoform tpm "  />
   </outputs>
 
 <help>
 **What it does**
 
-Computes gene and isoform differential expression between two conditions (example tumor and normal) for both Fragment per Kilobase of transcript length per Million 
-bases (FPKM) and Transcripts per Million (TPM) values. The computation is based on the boostraping output generated by IsoEM2. 
+IsoDE2 computes isoforms and genes that are differentially expressed between two conditions (e.g., treated vs. control). 
+The computation is based on the boostrap samples generated by IsoEM2. 
 
 **Input**
 
-* - One or more IsoEM output files (compressed tar files) for each of the two conditions. More than one file can be used if there are replicated for either condition
-* - Desired Significance level for which a reliable fold change level will be reported
+* 1- Names for the two conditions (the default values 'Condition 1' and 'Condition 2' will be used in the output file names if these are not specified)
+* 2- One or more IsoEM output files (compressed tar files) for each of the two conditions. Multiple files can be used when IsoEM2 is run independently on replicates for each condition.
+* 3- Desired significance level for which a reliable fold change level will be reported.
 *
 
 
@@ -63,16 +68,13 @@
 
 **Output**
 
-* four output files containinag results for Gene FPKM DE, Gene TPM DE, Isoform FPKM DE, and Isoform TPM DE. The four files have identical format with the following fields
-* 1- Gene/isoform ID
-* 2- Conservative log_2(FC) : conservative estimate of fold change in log base 2. 
-*               For the confidence level as input, fold change of gene/isoform abundance (FPKM/TPM) in condition 2 compared condition 1 is 
-*               at least 2 ^ absoulte value of this field.The sign indicates the direction, +ve means over expressed in condition 2, -ve means underexpressed in
-*               condition 1. 0 indicates that no change was detected.
-* 3- log_2(condition 2 FPKM (or TPM)/condition 1 FPKM(or TPM)) based on IsoEM2 run without bootstrapping (average FPKM or TPM if replicates used)
-* 4- condition 1 FPKM (or TPM) based on IsoEM2 run without bootstrapping (average if replicates used)
-* 5- condition 2 FPKM (or TPM) based on IsoEM2 run without bootstrapping (average if replicates used)
- 
+* Four output files containinag results of isoform and gene differential expression analysis based on both Fragments per Kilobase per Million (FPKM) and Transcripts per Million (TPM) expression levels. The four tab delimited files have identical format with the following fields:
+* 1- Isoform/Gene ID
+* 2- Confident log_2(FC): the base 2 logarithm of the largest fold change of isoform/gene FPKM/TPM estimates of condition 2 vs condition 1 which is supported by the provided bootstrap samples at the specified significance level. Positive values represent over-expression in Condition 2, while negative values representing over-expression in Condition 1. A zero value in this field indicates that no significant change was detected.
+* 3- Single run log2(FC): the base 2 logarithm of the ratio between expression levels estimated by IsoEM2 for Condition 2 and Condition 1 (ratio between mean estimates in case replicates are provided for the two conditions).
+* 4- Condition 1 FPKM or TPM: the IsoEM2 expression level estimated for Condition 1 (mean value in case of replicates).
+* 5- Condition 2 FPKM or TPM: the IsoEM2 expression level estimated for Condition 2 (mean value in case of replicates).
+*
 
 </help>
 </tool>
--- a/isoem2_isode2/isoem_wrapper.xml	Fri Mar 17 06:47:46 2017 -0400
+++ b/isoem2_isode2/isoem_wrapper.xml	Sun Mar 26 22:27:30 2017 -0400
@@ -1,5 +1,5 @@
 <tool id="isoem" name="IsoEM2" version="1.0.0">
-    <description> Infers isoform and gene expression levels from high-throughput transcriptome sequencing (RNA-Seq) data</description>
+    <description> Infers isoform and gene expression levels with bootstrap based confidence intervals from RNA-Seq data</description>
     <requirements>
         
     </requirements>
@@ -40,6 +40,8 @@
                           
     </command>
     <inputs>
+	    <param name="sampleName"  size="10" type="text"  label="Sample name" value="Sample1"/>
+
         <conditional name="referenceSource">
           <param name="CCDSsource" type="select" label="Will you upload a reference transcriptome fasta file from your history or use a built-in reference?" help="Built-ins were indexed using default options">
             <option value="indexed">Use a built-in reference</option>
@@ -94,37 +96,71 @@
 -->
     </inputs>
     <outputs>
-        <data name="out_gene_fpkm" format="tabular" label="Gene_fpkm"/>
-    	<data name="out_gene_tpm" format="tabular" label="Gene_tpm"/>
-    	<data name="out_iso_fpkm" format="tabular" label="Iso_fpkm"/>
-    	<data name="out_iso_tpm" format="tabular" label="Iso_tpm"/>
-	<data name="out_bootstrap" format="toolshed.gz" label="Bootstrap.tar.gz"/>
-        <data name="Run" format="log"  label="isoem_wrapper: The log file" />
+        <data name="out_gene_fpkm" format="tabular" label="${sampleName}-Gene_fpkm"/>
+    	<data name="out_gene_tpm" format="tabular" label="${sampleName}-Gene_tpm"/>
+    	<data name="out_iso_fpkm" format="tabular" label="${sampleName}-Iso_fpkm"/>
+    	<data name="out_iso_tpm" format="tabular" label="${sampleName}-Iso_tpm"/>
+	<data name="out_bootstrap" format="toolshed.gz" label="${sampleName}-Bootstrap.tar.gz"/>
+        <data name="Run" format="log"  label="${sampleName}: The log file" />
     </outputs>
 <help>
 **What it does**
 
-* The IsoEM can be used to infer isoform and gene expression levels from high-throughput transcriptome sequencing (RNA-Seq) data. 
+* IsoEM2 infers isoform and gene expression levels (along with bootstrapping based confidence intervals) from high-throughput transcriptome sequencing (RNA-Seq) data. 
+*
 
 **Input Format**
 
-* The tool accept the fastq, fastq.gz, bam formats. Extension must be specified at the end of the file names.
-* RNA-seq data must be Ion Torrent Proton or Illumina sequncing data.
+* The IsoEM2 tool can process RNA-seq reads generated by both Ion Torrent and Illumina platforms. RNA-Seq reads must be provided in fastq, fastq.gz, or bam formats. 
+
+**Output Format**
+
+* IsoEM2 generates four output files containinag results for **Isoform FPKM**, **Isoform TPM**, **Gene FPKM**, and **Gene TPM**. The four tab delimited files have identical format, including the following fields:
+
+
+* 1- Isoform/Gene ID 
+* 2- Isoform/Gene FPKM (Fragments Per Kilobase per Million reads) or TPM (Transcripts per Million reads) 
+* 3- Lower-bound for the 95% confidence interval of the Isoform/Gene FPKM/TPM estimate determined by bootstrapping
+* 4- Upper-bound for the 95% confidence interval of the Isoform/Gene FPKM/TPM estimate determined by bootstrapping
+* 5- A compressed tar archive containing bootstrap samples used to determine confidence intervals. These archives can be used as input to the IsoDE2 tool for computing differentially expressed isoforms/genes.
+*
 
 -----
 
 
-**Output Format**
+**BUILT-IN REFERENCES**
+
+**mm10_C57BL/6:** 
+
+* GTF file: /import1/CCDS/Mm38.1/CCDS_nucleotide.20140407.fna.GTF
+* TMAP_index:/import1/tmap-index/tmap3.4.1/mm10/CCDS_nucleotide.20140407.fna
+* HISAT2_index: /import1/hisat2-index/mm10_CCDS/mm10_CCDS_nucleotide.20140407
+* Cluster file: /import1/CCDS/Mm38.1/CCDS_nucleotide.20140407.fna_transcriptID_geneName.txt
+
+**mm10_BALB/c:**
+
+* GTF file: /import1/CCDS/Mm38.1/CCDS_nucleotide.20140407.fna.GTF
+* TMAP_index: /import1/tmap-index/tmap3.4.1/mm10/mm10_CCDS_nucleotide.20140407_BALBc.fna
+* HISAT2_index: /import1/hisat2-index/mm10_CCDS/mm10_CCDS_nucleotide.20140407_BALBc
+* Cluster file: /import1/CCDS/Mm38.1/CCDS_nucleotide.20140407.fna_transcriptID_geneName.txt
 
-* Four output files containinag results for **Gene FPKM**, **Gene TPM**, **Isoform FPKM**, and **Isoform TPM**. The four files have identical format with the following fields.
+**hg19**
+
+* GTF file: /import1/CCDS/HsGRCh37.1/HsGRCh37.1_CCDS_nucleotide.20131129.fa.GTF
+* TMAP_index: /import1/tmap-index/tmap3.4.1/hg19/hg19_CCDS_nucleotide.20131129.fa
+* HISAT2_index: /import1/hisat2-index/hg19/hg19_CCDS_nucleotide.20131129.fna
+* Cluster file: /import1/CCDS/HsGRCh37.1/HsGRCh37.1_CCDS.20131129_transcriptID_geneName.txt
+
+**hg38**
+
+* GTF file: /import1/CCDS/GRCh38.p2/GRCh38.p2_CCDS_nucleotide.20150512.fna.GTF
+* TMAP_index: /import1/tmap-index/tmap3.4.1/hg38/hg38_CCDS_nucleotide.20150512.fna
+* HISAT2_index: /import1/hisat2-index/hg38_CCDS_downloadedRef/h19_CCDS_nucleotide.20150512.fna
+* Cluster file: /import1/CCDS/GRCh38.p2/GRCh38.p2_CCDS.20150512_transcriptID_geneName.txt
+	
+-----
 
 
-* 1 Gene/Isoform ID 
-* 2 Gene/Isoform FPKM (Fragments Per Kilobase per Million reads) or TPM (Transcripts per Million reads) 
-* 3 Min FPKM/TPM
-* 4 Max FPKM/TPM
-
-* And one compressed **Bootstrap.tar** file will be used in IsoDE2 to compute gene differential expression.
 </help>