0
|
1 ==============================
|
|
2 MutSpec-Suite
|
|
3 ==============================
|
|
4
|
|
5 Created by Maude Ardin and Vincent Cahais (Mechanisms of Carcinogenesis Section, International Agency for Research on Cancer F69372 Lyon France, http://www.iarc.fr/)
|
|
6
|
|
7 Version 1.0
|
|
8
|
|
9 Released under GNU public license version 2 (GPL v2)
|
|
10
|
|
11 Package description: Ardin et al. - 2016 - MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes - BMC Bioinformatics
|
|
12
|
|
13 Test data: https://usegalaxy.org/u/maude-ardin/p/mutspectestdata
|
|
14
|
|
15
|
|
16 ### Requirements
|
|
17
|
|
18 # python-dev
|
|
19 build-essential and python-dev packages must be installed on your machine before installing MutSpec tools:
|
|
20 $ sudo apt-get install build-essential python-dev
|
|
21
|
|
22
|
|
23 # Annovar
|
|
24 If you do not have ANNOVAR installed, you can download it here: http://www.openbioinformatics.org/annovar/annovar_download_form.php
|
|
25
|
|
26 1) Once downloaded, install annovar per the installation instructions and edit the PATH variable in galaxy deamon (/etc/init.d/galaxy) to reflect the location of directory containing perl scripts.
|
|
27
|
|
28 2) Create directories for saving Annovar databases
|
|
29 2-a Create a folder (annovardb) for saving all Annovar databases, e.g. hg19db
|
|
30 2-b Create a subfolder (seqFolder) for saving the reference genome, e.g. hg19db/hg19_seq
|
|
31
|
|
32 3) Download the reference genome (by chromosome) from UCSC for all desired builds as follows:
|
|
33 $ annotate_variation.pl -buildver <build> -downdb seq <seqFolder>
|
|
34
|
|
35 where <build> can be hg18, hg19 or hg38 for the human genome or mm9, mm10 for the mouse genome.
|
|
36 and <seqFolder> is the location where the sequences (by chromosme) should be stored, e.g. hg19db/hg19_seq
|
|
37
|
|
38
|
|
39 4) Download all desired databases for all desired builds as follows:
|
|
40 $ annotate_variation.pl -buildver <build> [-webfrom annovar] -downdb <database> <annovardb>
|
|
41
|
|
42 /!\ At least the database refGene must be downloaded /!\
|
|
43
|
|
44 where <build> can be hg18, hg19 or hg38 for the human genome or mm9, mm10 for the mouse genome.
|
|
45 and <database> is the database file to download, e.g. refGene
|
|
46 and <annovardb> is the location where all database files should be stored, e.g. hg19db
|
|
47
|
|
48 The list of all available databases can be found here: http://annovar.openbioinformatics.org/en/latest/user-guide/download/
|
|
49
|
|
50
|
|
51 5) Edit the annovar_index.loc file (in the folder galaxy-dist/tool-data/toolshed/repos/iarc/mutspec/revision/) to reflect the location of annovardb folder (containing all the databases files downloaded from Annovar).
|
|
52 Restart galaxy instance for changes in .loc file to take effect or reload it into the admin interface.
|
|
53
|
|
54 6) Edit the file build_listAVDB.txt in the mutspec install directory to reflect the name and the type of the databases installed
|
|
55
|
|
56
|
|
57 ### Installation
|
|
58
|
|
59 # MutSpec-Stat and MutSpec-NMF
|
|
60 By default 1 CPU is used by these tools, but you may edit mutspecStat_wrapper.sh and mutspecNmf_wrapper.sh to change this number to the maximum number of CPU available on your server.
|
|
61
|
|
62 MutSpec-Stat and MutSpec-NMF tools allow parallel computations that are time consuming.
|
|
63 It is recommended to use the highest number of cores available on the Galaxy server to reduce the computation time of these tools.
|
|
64
|
|
65
|
|
66
|
|
67 # MutSpec-Annot
|
|
68 The maximum CPU value needs to be specified when installing MutSpec package by editing the file mutspecAnnot.pl to reflect the maximum number of CPU available on your server (by default 1 CPU is used).
|
|
69
|
|
70 This tool may be time consuming for large files. For example, annotating a file with more than 25,000 variants takes 1 hour using 1 CPU (2.6 GHz), while annotating this file using 8 CPUs takes only 5 minutes.
|
|
71 We have optimized MutSpec-Annot so that the tool uses more CPUs, if available, as follows:
|
|
72 -files with less than 5,000 lines: 1 CPU is used
|
|
73 -files with more than 5,000 and less than 25,000 lines: 2 CPUs are used
|
|
74 -files with more than 25,000 and less than 100,000 lines: 8 (or maximum CPUs, if less than 8 CPUs are available) are used (our benchmark results didn't show any time saving using more than 8 cores for files with more than 25,000
|
|
75 but less than 100,000 lines)
|
|
76 -files with more than 100,000: maximum CPUs are used
|