changeset 2:9d363eb081b5 draft

Uploaded
author iarc
date Thu, 28 Apr 2016 03:43:25 -0400
parents 748b7a8b634c
children 14fe7238c6d7
files README.txt mutspecAnnot.pl mutspecFilter.xml mutspecStat.xml
diffstat 4 files changed, 41 insertions(+), 34 deletions(-) [+]
line wrap: on
line diff
--- a/README.txt	Thu Apr 21 09:36:32 2016 -0400
+++ b/README.txt	Thu Apr 28 03:43:25 2016 -0400
@@ -2,17 +2,21 @@
           MutSpec-Suite        
 ==============================
 
-Created by Maude Ardin and Vincent Cahais (Mechanisms of Carcinogenesis Section, International Agency for Research on Cancer F69372 Lyon France, http://www.iarc.fr/)
+Created by Maude Ardin and Vincent Cahais (Mechanisms of Carcinogenesis Section, International Agency for Research on Cancer F69372 Lyon France,
+http://www.iarc.fr/)
 
 Version 1.0
 
 Released under GNU public license version 2 (GPL v2)
 
-Package description: Ardin et al. - 2016 - MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes - BMC Bioinformatics
+Package description: Ardin et al. - 2016 - MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse
+cancer genomes - BMC Bioinformatics
+http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1011-z
 
 Test data: https://usegalaxy.org/u/maude-ardin/p/mutspectestdata
 
 
+
 ### Requirements
 
 	# python-dev
@@ -23,7 +27,8 @@
 	# Annovar
 If you do not have ANNOVAR installed, you can download it here: http://www.openbioinformatics.org/annovar/annovar_download_form.php
 
-1) Once downloaded, install annovar per the installation instructions and edit the PATH variable in galaxy deamon (/etc/init.d/galaxy) to reflect the location of directory containing perl scripts.
+1) Once downloaded, install annovar per the installation instructions and edit the PATH variable in galaxy deamon (/etc/init.d/galaxy)
+to reflect the location of directory containing perl scripts.
 
 2) Create directories for saving Annovar databases
 	2-a Create a folder (annovardb) for saving all Annovar databases, e.g. hg19db
@@ -48,7 +53,8 @@
 The list of all available databases can be found here: http://annovar.openbioinformatics.org/en/latest/user-guide/download/
 
 
-5) Edit the annovar_index.loc file (in the folder galaxy-dist/tool-data/toolshed/repos/iarc/mutspec/revision/) to reflect the location of annovardb folder (containing all the databases files downloaded from Annovar).
+5) Edit the annovar_index.loc file (in the folder galaxy-dist/tool-data/toolshed/repos/iarc/mutspec/revision/) to reflect the location
+of annovardb folder (containing all the databases files downloaded from Annovar).
 Restart galaxy instance for changes in .loc file to take effect or reload it into the admin interface.
 
 6) Edit the file build_listAVDB.txt in the mutspec install directory to reflect the name and the type of the databases installed
@@ -57,20 +63,24 @@
 ### Installation
 
 	# MutSpec-Stat and MutSpec-NMF
-By default 1 CPU is used by these tools, but you may edit mutspecStat_wrapper.sh and mutspecNmf_wrapper.sh to change this number to the maximum number of CPU available on your server.
+By default 8 CPUs are used by these tools, but you may edit mutspecStat_wrapper.sh and mutspecNmf_wrapper.sh to change this number
+to the maximum number of CPU available on your server.
 
 MutSpec-Stat and MutSpec-NMF tools allow parallel computations that are time consuming.
 It is recommended to use the highest number of cores available on the Galaxy server to reduce the computation time of these tools.
 
 
 
+
 	# MutSpec-Annot
-The maximum CPU value needs to be specified when installing MutSpec package by editing the file mutspecAnnot.pl to reflect the maximum number of CPU available on your server (by default 1 CPU is used).
+The maximum CPU value needs to be specified when installing MutSpec package by editing the file mutspecAnnot.pl to reflect the maximum number
+of CPU available on your server.
 
-This tool may be time consuming for large files. For example, annotating a file with more than 25,000 variants takes 1 hour using 1 CPU (2.6 GHz), while annotating this file using 8 CPUs takes only 5 minutes.
+This tool may be time consuming for large files. For example, annotating a file of more than 25,000 variants takes 1 hour using 1 CPU (2.6 GHz),
+while annotating this file using 8 CPUs takes only 5 minutes.
 We have optimized MutSpec-Annot so that the tool uses more CPUs, if available, as follows:
 -files with less than 5,000 lines: 1 CPU is used
 -files with more than 5,000 and less than 25,000 lines: 2 CPUs are used
--files with more than 25,000 and less than 100,000 lines: 8 (or maximum CPUs, if less than 8 CPUs are available) are used (our benchmark results didn't show any time saving using more than 8 cores for files with more than 25,000 
-but less than 100,000 lines)
+-files with more than 25,000 and less than 100,000 lines: 8 (or maximum CPUs, if less than 8 CPUs are available) are used (our benchmark
+results didn't show any time saving using more than 8 cores for files with more than 25,000 but less than 100,000 lines)
 -files with more than 100,000: maximum CPUs are used 
--- a/mutspecAnnot.pl	Thu Apr 21 09:36:32 2016 -0400
+++ b/mutspecAnnot.pl	Thu Apr 28 03:43:25 2016 -0400
@@ -3,7 +3,7 @@
 #-----------------------------------#
 # Author: Maude                     #
 # Script: mutspecAnnot.pl           #
-# Last update: 17/02/16             #
+# Last update: 26/04/16             #
 #-----------------------------------#
 
 use strict;
@@ -38,7 +38,7 @@
 #########################################
 ###     SPECIFY THE NUMBER OF CPU     ###
 #########################################
-our $max_cpu = 1; # Max number of CPU to use for the annotation
+our $max_cpu = 12; # Max number of CPU to use for the annotation
 
 
 # Recover the current path
@@ -524,21 +524,19 @@
 			if($fullAVDB eq "yes") { AnnotateAV("$folder_temp/$outFilenameTemp-AVInput", "$folder_temp/$outFilenameTemp"); }
 			else { annotateAV_min("$folder_temp/$outFilenameTemp-AVInput", "$folder_temp/$outFilenameTemp"); }
 
-
 			# Check if the annotations worked
-			open(F1, "$folderMutAnalysis/log_annovar.txt") or die "$!: $folderMutAnalysis/log_annovar.txt\n";
-			while(<F1>)
-			{
-				if($_ =~ /ERROR/i)
+				open(F1, "$folderMutAnalysis/log_annovar.txt") or die "$!: $folderMutAnalysis/log_annovar.txt\n";
+				while(<F1>)
 				{
-					print STDERR "\n\n\t\tANNOVAR LOG FILE\n\n";
-					print STDERR $_;
-					print STDERR "\n\n\t\tANNOVAR LOG FILE\n\n\n";
-					exit;
+					if($_ =~ /ERROR/i)
+					{
+						print STDERR "\n\n\t\tANNOVAR LOG FILE\n\n";
+						print STDERR $_;
+						print STDERR "\n\n\t\tANNOVAR LOG FILE\n\n\n";
+						exit;
+					}
 				}
-			}
-			close F1;
-
+				close F1;
 
 			# Recover the strand orientation
 			my $length_AVheader = 0;
@@ -552,11 +550,9 @@
 		# Wait all the child process
 		$pm->wait_all_children;
 
-
-		#### Paste the file together
+		# Paste the file together
 		CombinedTempFile("$folder_temp/$filenameO", "$folderAnnovar/$filenameO".".".${refGenome}."_multianno.txt");
 	}
-
 	# Remove the temporary directory
 	rmtree($folder_temp);
 }
@@ -669,7 +665,7 @@
 			my @tab = split("\t", $_);
 
 			# db name like refGenome_dbName.txt
-			if( ($tab[0] =~ /\w+_(\w+)\.txt/) && ($tab[0] !~ /sites/) && ($tab[0] !~ /esp/) && ($tab[0] !~ /sift/) && ($tab[0] !~ /pp2/) )
+			if( ($tab[0] =~ /\w+_(\w+)\.txt/) && ($tab[0] !~ /sites/) && ($tab[0] !~ /esp/) && ($tab[0] !~ /ljb26/) )
 			{
 				$$refS_protocol .= $1.","; $$refS_operation .= $tab[1].",";
 			}
@@ -687,7 +683,7 @@
 				$$refS_protocol .=$AVdbName_final.","; $$refS_operation .= $tab[1].",";
 			}
 			# ESP
-			if( ($tab[0] =~ /esp/) || ($tab[0] =~ /sift/) || ($tab[0] =~ /pp2/) )
+			if( ($tab[0] =~ /esp/) || ($tab[0] =~ /ljb26/) )
 			{
 				$tab[0] =~ /\w+_(\w+)_(\w+)\.txt/;
 				my $AVdbName_final = $1."_".$2;
@@ -1150,7 +1146,7 @@
           mutspecannot.pl --refGenome hg19 --interval 10 --outfile output_directory --pathAnnovarDB path_to_annovar_database --pathAVDBList path_to_the_list_of_annovar_DB --temp path_to_temporary_directory --fullAnnotation yes|no input
 
 
- Version: 02-2016 (Feb 2016)
+ Version: 04-2016 (Apr 2016)
 
 
 =head1 OPTIONS
--- a/mutspecFilter.xml	Thu Apr 21 09:36:32 2016 -0400
+++ b/mutspecFilter.xml	Thu Apr 28 03:43:25 2016 -0400
@@ -12,14 +12,14 @@
         $segDup
         $esp
         $thG
-	    #if $FilterdbSNP.dbSNP == True:
+        #if str($FilterdbSNP.dbSNP) == "true" or $FilterdbSNP.dbSNP == True:
            --dbSNP ${FilterdbSNP.column}
         #else
            --dbSNP 0
         #end if
-	--refGenome ${refGenome}
+        --refGenome ${refGenome} 
         --outfile $output
-	$input
+		$input
 </command>
 
 <inputs>
@@ -94,6 +94,7 @@
 
 </help>
 
+
 <citations>
     <citation type="bibtex">
         @ARTICLE{ardin_mutspec:_2016,
--- a/mutspecStat.xml	Thu Apr 21 09:36:32 2016 -0400
+++ b/mutspecStat.xml	Thu Apr 28 03:43:25 2016 -0400
@@ -14,7 +14,7 @@
         mutspecStat_wrapper.sh
         $html
         ${GALAXY_DATA_INDEX_DIR}/shared/ucsc/chrom/
-        #if $estimateSignature.estimSign == True:
+        #if str($estimateSignature.estimSign) == "true" or $estimateSignature.estimSign == True:
               ${estimateSignature.estimT}
         #else
             0
@@ -43,7 +43,7 @@
 	<param name="reportSample" type="boolean" checked="false" truevalue="--reportSample" falsevalue="" label="Generate one output file for each sample" help="By default, one output Excel file will be generated with statistics of each sample shown in different data sheets. Setting this option to true will generate one Excel file for each sample instead. It is recommended to use this option if your dataset list contains more than 250 files as the Excel output file may be too heavy to open easily on a computer with limited RAM"/>
 
     <conditional name="estimateSignature">
-        <param name="estimSign" type="boolean" label="Compute statistics for estimating the number of signatures" help="This option gererates different statistics that can be used to estimate the number of signatures to extract with NMF (this number should be used in the MutSpec-NMF tool"/>
+        <param name="estimSign" type="boolean" checked="false" truevalue="true" label="Compute statistics for estimating the number of signatures" help="This option gererates different statistics that can be used to estimate the number of signatures to extract with NMF (this number should be used in the MutSpec-NMF tool"/>
         <when value="true">
             <param name="estimT" type="text" value="8" label="Maximum number of signatures to compute" help="Warning: Selecting a number above 8 may not work on small datasets"/>
         </when>