view mutspecFilter.xml @ 4:916846f73e25 draft

Uploaded
author iarc
date Fri, 29 Apr 2016 05:11:28 -0400
parents 9d363eb081b5
children 46a10309dfe2
line wrap: on
line source

<tool id="MutSpecfilter" name="MutSpec Filter" version="0.1" hidden="false">
<description>Filter out variants present in public databases</description>

<requirements>
    <requirement type="set_environment">SCRIPT_PATH</requirement>
    <requirement type="package" version="5.18.1">perl</requirement>
</requirements>

<command interpreter="perl">
        mutspecFilter.pl 
        --dir \$SCRIPT_PATH 
        $segDup
        $esp
        $thG
        #if str($FilterdbSNP.dbSNP) == "true" or $FilterdbSNP.dbSNP == True:
           --dbSNP ${FilterdbSNP.column}
        #else
           --dbSNP 0
        #end if
        --refGenome ${refGenome} 
        --outfile $output
		$input
</command>

<inputs>
	<param name="input" type="data" format="txt" label="Input file"/>
	
	<param name="refGenome" type="select" label="Reference genome" help="All your data should have been annotated with the selected genome">
        <options from_data_table="annovar_index" />
    </param>

    <conditional name="FilterdbSNP">
        <param name="dbSNP" type="boolean" checked="true" truevalue="true" label="Filter against dbSNP database" help="Remove variants with a RS number" />
        <when value="true">
            <param name="column" type="data_column" data_ref="input" label="Select the dbSNP column for filtering" use_header_names="true" help="Select a column name snp or snpNonFlagged" />
        </when>
    </conditional>    


    <param name="segDup" type="boolean" checked="true" truevalue="--segDup" falsevalue="" label="Filter against SegDup database" help="Remove variants present at &#62;= 0.9 frequency in the genomic duplicate segments database" />
    <param name="esp" type="boolean" checked="true" truevalue="--esp" falsevalue="" label="Filter against the ESP database" help="Remove variants present at frequency &#62; 0.001 in the Exome Sequencing Project database (only valid for human genomes)" />
    <param name="thG" type="boolean" checked="true" truevalue="--thG" falsevalue="" label="Filter against the 1000g database project" help="Remove variants present at frequency &#62; 0.001 in the 1000 genome database (only valid for human genomes)" />
</inputs>

<outputs>
  	<data  type="data" name="output" format="tabular" label="${input.name.split(' ')[0]} filtered" />
</outputs>

<help>

**What it does**

Filter a file annotated with MutSpec-Annot tool. Variants present in public databases (dbSNP, SegDup, ESP, 1000 genome obtained from Annovar) will be removed from the input file (with frequency limits described above).

.. class:: warningmark

The databases ESP and 1000 genome can be used only for human genomes

--------------------------------------------------------------------------------------------------------------------------------------------------

**Input**

.. class:: warningmark

Tab delimited text files generated by MutSpec-Annot tool.

--------------------------------------------------------------------------------------------------------------------------------------------------

**Output**

Tab delimited text file filtered for variants considered as neutral polymorphisms.

--------------------------------------------------------------------------------------------------------------------------------------------------

**Example**

Filter the following file::

     Chr    Start      End        Ref  Alt  Func.refGene  Gene.refGene  ExonicFunc.refGene    AAChange.refGene                           genomicSuperDups                   snp138       1000g2014oct_all  esp6500si_all  Strand  context  Chromosome  Start_Position  End_Position  Reference_Allele  Tumor_Seq_Allele2
     chr7   121717919  121717920  -    G    exonic        AASS          frameshift insertion  AASS:NM_005763:exon23:c.2634dupC:p.A879fs  NA                                 rs147476318  NA                NA             -       GCG      chr7        121717919       121717920     -                 G
     chr1   230846235  230846235  T    A    exonic        AGT           nonsynonymous SNV     AGT:NM_000029:exon2:c.A362T:p.H121L        NA                                 NA           NA                NA             -       GTG      chr1        230846235       230846235     T                 A
     chr14  33290999   33290999   A    G    exonic        AKAP6         nonsynonymous SNV     AKAP6:NM_004274:exon13:c.A3980G:p.D1327G   NA                                 NA           NA                NA             +       GAC      chr14       33290999        33290999      A                 G
     chr12  8082458    8082458    C    T    exonic        SLC2A3        nonsynonymous SNV     SLC2A3:NM_006931:exon6:c.G683A:p.R228Q     NA                                 rs200481428  0.000199681       NA             -       CCG      chr12       8082458         8082458       C                 T
     chr4   70156391   70156391   T    C    exonic        UGT2B28       nonsynonymous SNV     UGT2B28:NM_053039:exon5:c.T1172C:p.V391A   score=0.949699;Name=chr4:70035680  NA           0.000199681       NA             +       GTA      chr4        70156391        70156391      T                 C

Will produce::

     Chr    Start      End        Ref  Alt  Func.refGene  Gene.refGene  ExonicFunc.refGene    AAChange.refGene                           genomicSuperDups                   snp138       1000g2014oct_all  esp6500si_all  Strand  context  Chromosome  Start_Position  End_Position  Reference_Allele  Tumor_Seq_Allele2
     chr1   230846235  230846235  T    A    exonic        AGT           nonsynonymous SNV     AGT:NM_000029:exon2:c.A362T:p.H121L        NA                                 NA           NA                NA             -       GTG      chr1        230846235       230846235     T                 A
     chr14  33290999   33290999   A    G    exonic        AKAP6         nonsynonymous SNV     AKAP6:NM_004274:exon13:c.A3980G:p.D1327G   NA                                 NA           NA                NA             +       GAC      chr14       33290999        33290999      A                 G
     chr4   70156391   70156391   T    C    exonic        UGT2B28       nonsynonymous SNV     UGT2B28:NM_053039:exon5:c.T1172C:p.V391A   score=0.949699;Name=chr4:70035680  NA           0.000199681       NA             +       GTA      chr4        70156391        70156391      T                 C



</help>


<citations>
    <citation type="bibtex">
        @article{ardin_mutspec:_2016,
            title = {{MutSpec}: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes},
            volume = {17},
            issn = {1471-2105},
            doi = {10.1186/s12859-016-1011-z},
            shorttitle = {{MutSpec}},
            abstract = {{BACKGROUND}: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called {MutSpec}.
        {RESULTS}: {MutSpec} includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the {COSMIC} database and other sources. {MutSpec} offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. {MutSpec} may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.
        {CONCLUSIONS}: {MutSpec} offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. {MutSpec} can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.},
            pages = {170},
            number = {1},
            journaltitle = {{BMC} Bioinformatics},
            author = {Ardin, Maude and Cahais, Vincent and Castells, Xavier and Bouaoun, Liacine and Byrnes, Graham and Herceg, Zdenko and Zavadil, Jiri and Olivier, Magali},
            date = {2016},
            pmid = {27091472},
            keywords = {Galaxy, Mutation signatures, Mutation spectra, Single base substitutions}
        }
    </citation>
</citations>

</tool>