annotate README.txt @ 0:8c682b3a7c5b draft

Uploaded
author iarc
date Tue, 19 Apr 2016 03:07:11 -0400
parents
children 9d363eb081b5
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
1 ==============================
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
2 MutSpec-Suite
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
3 ==============================
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
4
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
5 Created by Maude Ardin and Vincent Cahais (Mechanisms of Carcinogenesis Section, International Agency for Research on Cancer F69372 Lyon France, http://www.iarc.fr/)
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
6
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
7 Version 1.0
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
8
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
9 Released under GNU public license version 2 (GPL v2)
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
10
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
11 Package description: Ardin et al. - 2016 - MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes - BMC Bioinformatics
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
12
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
13 Test data: https://usegalaxy.org/u/maude-ardin/p/mutspectestdata
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
14
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
15
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
16 ### Requirements
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
17
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
18 # python-dev
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
19 build-essential and python-dev packages must be installed on your machine before installing MutSpec tools:
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
20 $ sudo apt-get install build-essential python-dev
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
21
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
22
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
23 # Annovar
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
24 If you do not have ANNOVAR installed, you can download it here: http://www.openbioinformatics.org/annovar/annovar_download_form.php
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
25
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
26 1) Once downloaded, install annovar per the installation instructions and edit the PATH variable in galaxy deamon (/etc/init.d/galaxy) to reflect the location of directory containing perl scripts.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
27
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
28 2) Create directories for saving Annovar databases
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
29 2-a Create a folder (annovardb) for saving all Annovar databases, e.g. hg19db
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
30 2-b Create a subfolder (seqFolder) for saving the reference genome, e.g. hg19db/hg19_seq
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
31
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
32 3) Download the reference genome (by chromosome) from UCSC for all desired builds as follows:
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
33 $ annotate_variation.pl -buildver <build> -downdb seq <seqFolder>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
34
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
35 where <build> can be hg18, hg19 or hg38 for the human genome or mm9, mm10 for the mouse genome.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
36 and <seqFolder> is the location where the sequences (by chromosme) should be stored, e.g. hg19db/hg19_seq
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
37
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
38
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
39 4) Download all desired databases for all desired builds as follows:
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
40 $ annotate_variation.pl -buildver <build> [-webfrom annovar] -downdb <database> <annovardb>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
41
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
42 /!\ At least the database refGene must be downloaded /!\
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
43
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
44 where <build> can be hg18, hg19 or hg38 for the human genome or mm9, mm10 for the mouse genome.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
45 and <database> is the database file to download, e.g. refGene
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
46 and <annovardb> is the location where all database files should be stored, e.g. hg19db
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
47
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
48 The list of all available databases can be found here: http://annovar.openbioinformatics.org/en/latest/user-guide/download/
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
49
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
50
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
51 5) Edit the annovar_index.loc file (in the folder galaxy-dist/tool-data/toolshed/repos/iarc/mutspec/revision/) to reflect the location of annovardb folder (containing all the databases files downloaded from Annovar).
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
52 Restart galaxy instance for changes in .loc file to take effect or reload it into the admin interface.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
53
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
54 6) Edit the file build_listAVDB.txt in the mutspec install directory to reflect the name and the type of the databases installed
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
55
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
56
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
57 ### Installation
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
58
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
59 # MutSpec-Stat and MutSpec-NMF
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
60 By default 1 CPU is used by these tools, but you may edit mutspecStat_wrapper.sh and mutspecNmf_wrapper.sh to change this number to the maximum number of CPU available on your server.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
61
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
62 MutSpec-Stat and MutSpec-NMF tools allow parallel computations that are time consuming.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
63 It is recommended to use the highest number of cores available on the Galaxy server to reduce the computation time of these tools.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
64
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
65
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
66
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
67 # MutSpec-Annot
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
68 The maximum CPU value needs to be specified when installing MutSpec package by editing the file mutspecAnnot.pl to reflect the maximum number of CPU available on your server (by default 1 CPU is used).
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
69
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
70 This tool may be time consuming for large files. For example, annotating a file with more than 25,000 variants takes 1 hour using 1 CPU (2.6 GHz), while annotating this file using 8 CPUs takes only 5 minutes.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
71 We have optimized MutSpec-Annot so that the tool uses more CPUs, if available, as follows:
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
72 -files with less than 5,000 lines: 1 CPU is used
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
73 -files with more than 5,000 and less than 25,000 lines: 2 CPUs are used
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
74 -files with more than 25,000 and less than 100,000 lines: 8 (or maximum CPUs, if less than 8 CPUs are available) are used (our benchmark results didn't show any time saving using more than 8 cores for files with more than 25,000
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
75 but less than 100,000 lines)
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
76 -files with more than 100,000: maximum CPUs are used