6
|
1 <tool id="mutSpecStat" name="MutSpec Stat" version="0.1" hidden="false">
|
|
2 <description>Calculate various statistics on mutations</description>
|
|
3
|
|
4 <requirements>
|
|
5 <requirement type="set_environment">SCRIPT_PATH</requirement>
|
|
6 <requirement type="package" version="5.18.1">perl</requirement>
|
|
7 <requirement type="package" version="3.3">weblogo</requirement>
|
|
8 <requirement type="package" version="1.7.1">numpy</requirement>
|
|
9 <requirement type="package" version="3.1.2">R</requirement>
|
|
10 <requirement type="package" version="0.1">mutspec</requirement>
|
|
11 </requirements>
|
|
12
|
|
13 <command interpreter="bash">
|
|
14 mutspecStat_wrapper.sh
|
|
15 $html
|
|
16 ${GALAXY_DATA_INDEX_DIR}/shared/ucsc/chrom/
|
|
17 #if str($estimateSignature.estimSign) == "true" or $estimateSignature.estimSign == True:
|
|
18 ${estimateSignature.estimT}
|
|
19 #else
|
|
20 0
|
|
21 #end if
|
|
22
|
|
23 "--refGenome ${refGenome} --pathSeqRefGenome ${refGenome.fields.path} $pooldata $reportSample"
|
|
24 #import re
|
|
25 #for $f in $dataset_list
|
|
26 #set $regexp = $re.compile("\((.*)\)")
|
|
27 #if $regexp.search($f.name)
|
|
28 #set filename=$regexp.search($f.name)
|
|
29 "$f=${filename.group(1)}"
|
|
30 #else
|
|
31 "$f=${f.name}"
|
|
32 #end if
|
|
33 #end for
|
|
34 </command>
|
|
35
|
|
36 <inputs>
|
|
37 <param name="dataset_list" type="data_collection" format="tabular" collection_type="list" label="Annotated Dataset List" help="Select a dataset list/collection from your history" />
|
|
38 <param name="refGenome" type="select" label="Reference genome" help="All data in your dataset list should have been generated with the selected genome">
|
|
39 <options from_data_table="annovar_index" />
|
|
40 </param>
|
|
41
|
|
42 <param name="pooldata" type="boolean" checked="true" truevalue="--pooldata" falsevalue="" label="Include statistics on the pooled samples" />
|
|
43 <param name="reportSample" type="boolean" checked="false" truevalue="--reportSample" falsevalue="" label="Generate one output file for each sample" help="By default, one output Excel file will be generated with statistics of each sample shown in different data sheets. Setting this option to true will generate one Excel file for each sample instead. It is recommended to use this option if your dataset list contains more than 250 files as the Excel output file may be too heavy to open easily on a computer with limited RAM"/>
|
|
44
|
|
45 <conditional name="estimateSignature">
|
|
46 <param name="estimSign" type="boolean" checked="false" truevalue="true" label="Compute statistics for estimating the number of signatures" help="This option gererates different statistics that can be used to estimate the number of signatures to extract with NMF (this number should be used in the MutSpec-NMF tool"/>
|
|
47 <when value="true">
|
|
48 <param name="estimT" type="text" value="8" label="Maximum number of signatures to compute" help="Warning: Selecting a number above 8 may not work on small datasets"/>
|
|
49 </when>
|
|
50 </conditional>
|
|
51
|
|
52 </inputs>
|
|
53
|
|
54 <outputs>
|
|
55 <data name="html" type="data" format="html" label="mutation spectra report on ${dataset_list.name}" />
|
|
56 </outputs>
|
|
57
|
|
58 <stdio>
|
|
59 <regex match="FutureWarning"
|
|
60 source="both"
|
|
61 level="warning"
|
|
62 description="FutureWarning" />
|
|
63 </stdio>
|
|
64
|
|
65 <help>
|
|
66
|
|
67 **What it does**
|
|
68
|
|
69 MutSpec-Stat calculates various statistics describing mutation characteristics extracted from a dataset collection, and estimate (optional) the number of signatures present in the dataset.
|
|
70 The statistics include overall distribution of mutations, mutation distribution for single base substitutions (SBS) by functional regions, chromosomes, or in their trinucleotide sequence context (see details below).
|
|
71
|
|
72 --------------------------------------------------------------------------------------------------------------------------------------------------
|
|
73
|
|
74 **Input formats**
|
|
75
|
|
76 The tool accepts a dataset list
|
|
77
|
|
78 .. class:: infomark
|
|
79
|
|
80 You should thus create a dataset list even when using one file (see Galaxy help to learn `how to create a dataset list`__)
|
|
81
|
|
82 .. __: https://wiki.galaxyproject.org/Histories#Dataset_Collections
|
|
83
|
|
84 .. class:: warningmark
|
|
85
|
|
86 The input files must have been generated by the MutSpec-Annot tool (so they contain the required annotations).
|
|
87
|
|
88 --------------------------------------------------------------------------------------------------------------------------------------------------
|
|
89
|
|
90 **Output**
|
|
91
|
|
92 MutSpec-Stat generates an html page with links to :
|
|
93 - an Excel file that includes all computed statistics shown in tabular and graphical formats, for each sample (one by datasheet) and for the pooled samples (optional),
|
|
94 - html pages for individual sample results,
|
|
95 - the input matrix for the tool MutSpec-NMF,
|
|
96 - the result of the estimation of the number of signatures (if the option "Compute statistics for estimating the number of signatures" was selected).
|
|
97
|
|
98 The following statistics are generated:
|
|
99
|
|
100 **Graph 1. SBS distribution**
|
|
101 Proportion (percent of all SBS) of each type of single base substitution (SBS).
|
|
102 All SBS are considered, including the ones without strand orientation annotation.
|
|
103
|
|
104 **Table 1. Frequency and counts of all SBS**
|
|
105 Values corresponding to graph 1.
|
|
106
|
|
107
|
|
108 **Graph 2. Impact on protein sequence**
|
|
109 Impact of all mutations (SBS and Indel) on the protein sequence based on the ExonicFunc.refGene annotation.
|
|
110 For more details about the annotation, please visit the `Annovar web page`__
|
|
111
|
|
112 .. __: http://www.openbioinformatics.org/annovar/annovar_gene.html#output1
|
|
113
|
|
114
|
|
115 **Table 2. Frequency and counts of functional impacts**
|
|
116 Values corresponding to graph 2.
|
|
117
|
|
118
|
|
119 **Graph 3. Stranded distribution of SBS**
|
|
120 Proportion (percent of all SBS with strand annotation) of the six substitution types on the transcribed and non-transcribed strand.
|
|
121 Only regions with strand annotation are considered.
|
|
122
|
|
123 **Table 3. Significance of the strand biases**
|
|
124 The strand bias for each SBS type is calculated as the ratio of SBS on the non-transcribed (coding) versus the transcribed (non-coding) strand.
|
|
125 The statistical significance of the differences between the mutational frequencies on the non-transcribed and the
|
|
126 transcribed strand (equal to 0.5, as expected by chance) is assessed using a chi-squared test followed by the Benjamini-
|
|
127 Hochberg procedure for multiple testing corrections (only samples with at least 1 mutations on the non-transcribed or on the transcribed strand are considered).
|
|
128 Two tables are shown to display the 6 SBS types in both orientations.
|
|
129
|
|
130
|
|
131 **Table 4. SBS distribution by functional region**
|
|
132 Count and percentages of SBS in genomic regions based on the Func.refGene annotation.
|
|
133
|
|
134
|
|
135 **Table 5. Strand bias by functional region**
|
|
136 Counts of the strand bias for the 6 SBS types in different functional regions.
|
|
137
|
|
138
|
|
139 **Table 6. SBS distribution per chromosome**
|
|
140 Counts of SBS per chromosome for the six SBS types.
|
|
141 The correlation between SBS counts and chromosome size is calculated using a Pearson correlation test.
|
|
142
|
|
143
|
|
144 **Panel 1. Trinucleotide sequence context of SBS on the genomic sequence**
|
|
145 The trinucleotide sequence context takes into consideration the flanking base in 5' and in 3' of the SBS.
|
|
146 SBS counts and frequency data are shown as tables, heatmaps or bar graphs. The heatmap colors are scaled to the maximum value of the corresponding table. The bar graph is scaled to the maximum frequency value (total number of mutation by SBS type is shown in parenthesis).
|
|
147
|
|
148
|
|
149
|
|
150 **Panel 2. Stranded analysis of trinucleotide sequence context of SBS**
|
|
151 SBS within their trinucleotide sequence context are counted on the non-transcribed and transcribed strands of the gene region they are located in. Counts and frequencies are shown as tables or bar graphs.
|
|
152 Only SBS with strand orientation annotation are considered in this analysis (strand annotation retrieved from RefSeq database).
|
|
153
|
|
154
|
|
155 </help>
|
|
156
|
|
157 <citations>
|
|
158 <citation type="bibtex">
|
|
159 @article{ardin_mutspec:_2016,
|
|
160 title = {{MutSpec}: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes},
|
|
161 volume = {17},
|
|
162 issn = {1471-2105},
|
|
163 doi = {10.1186/s12859-016-1011-z},
|
|
164 shorttitle = {{MutSpec}},
|
|
165 abstract = {{BACKGROUND}: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called {MutSpec}.
|
|
166 {RESULTS}: {MutSpec} includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the {COSMIC} database and other sources. {MutSpec} offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. {MutSpec} may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.
|
|
167 {CONCLUSIONS}: {MutSpec} offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. {MutSpec} can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.},
|
|
168 pages = {170},
|
|
169 number = {1},
|
|
170 journaltitle = {{BMC} Bioinformatics},
|
|
171 author = {Ardin, Maude and Cahais, Vincent and Castells, Xavier and Bouaoun, Liacine and Byrnes, Graham and Herceg, Zdenko and Zavadil, Jiri and Olivier, Magali},
|
|
172 date = {2016},
|
|
173 pmid = {27091472},
|
|
174 keywords = {Galaxy, Mutation signatures, Mutation spectra, Single base substitutions}
|
|
175 }
|
|
176 </citation>
|
|
177 </citations>
|
|
178
|
|
179 </tool>
|