Mercurial > repos > iarc > mutspec
view mutspecFilter.xml @ 7:eda59b985b1c draft default tip
Uploaded
author | iarc |
---|---|
date | Mon, 13 Mar 2017 08:21:19 -0400 |
parents | 46a10309dfe2 |
children |
line wrap: on
line source
<tool id="MutSpecfilter" name="MutSpec Filter" version="0.1" hidden="false"> <description>Filter out variants present in public databases</description> <requirements> <requirement type="set_environment">SCRIPT_PATH</requirement> <requirement type="package" version="5.18.1">perl</requirement> </requirements> <command interpreter="perl"> mutspecFilter.pl --dir \$SCRIPT_PATH $segDup $esp $thG $exac #if str($FilterdbSNP.dbSNP) == "true": --dbSNP ${FilterdbSNP.column} #else --dbSNP 0 #end if --refGenome ${refGenome} --outfile $output #for $i, $filter in enumerate( $filters ) --filter $filter.reference #end for $input; </command> <inputs> <param name="input" type="data" format="txt" label="Input file"/> <param name="refGenome" type="select" label="Reference genome" help="All your data should have been annotated with the selected genome"> <options from_data_table="annovar_index" /> </param> <conditional name="FilterdbSNP"> <param name="dbSNP" type="boolean" checked="true" truevalue="true" label="Filter against dbSNP database" help="Remove variants with a RS number" /> <when value="true"> <param name="column" type="data_column" data_ref="input" label="Select the dbSNP column for filtering" use_header_names="true" help="Select a column name snp or snpNonFlagged" /> </when> </conditional> <param name="segDup" type="boolean" checked="true" truevalue="--segDup" falsevalue="" label="Filter against SegDup database" help="Remove variants present at >= 0.9 frequency in the genomic duplicate segments database (Use only for human and mouse genomes)" /> <param name="esp" type="boolean" checked="true" truevalue="--esp" falsevalue="" label="Filter against the ESP database" help="Remove variants present at frequency > 0.001 in the Exome Sequencing Project database (Use only for human genome)" /> <param name="thG" type="boolean" checked="true" truevalue="--thG" falsevalue="" label="Filter against the 1000g database project" help="Remove variants present at frequency > 0.001 in the 1000 genome database (Use only for human genome)" /> <param name="exac" type="boolean" checked="true" truevalue="--exac" falsevalue="" label="Filter against the ExAC database" help="Remove variants present at frequency > 0.001 in the EXome Agregate Consortium database (Use only for human genome)" /> <repeat name="filters" title="Additional filters"> <param name="reference" type="data" format="bed" label="Reference file (bed or vcf)" help="Remove variants present in the reference file"/> </repeat> </inputs> <outputs> <data type="data" name="output" format="tabular" label="${input.name.split(' ')[0]} filtered" /> </outputs> <stdio> <regex match="Error message:" source="stderr" level="fatal" description="Read error message for more details" /> <regex match="Warning message:" source="stdout" level="warning" description="" /> </stdio> <help> **What it does** Filter a file annotated with MutSpec-Annot tool. Variants present in public databases obtained from Annovar will be removed from the input file (with frequency limits described above). .. class:: warningmark The database genomic duplicate segments can be used only for human and mouse genomes .. class:: warningmark The databases ESP, 1000 genome and ExAC can be used only for human genome -------------------------------------------------------------------------------------------------------------------------------------------------- **Input** .. class:: warningmark Tab delimited text files generated by MutSpec-Annot tool. -------------------------------------------------------------------------------------------------------------------------------------------------- **Additional Filters** .. class:: warningmark You eventually would like to filter for additional features like repeats and tandem repeats. You just need to provide the reference in vcf or bed format. .. class:: infomark Reference files are available on IARC Galaxy Shared Data. On the top panel click on "Shared Data" and select "Data Libraries". The category "BED annotations" contains reference files for different genomes. -------------------------------------------------------------------------------------------------------------------------------------------------- **Output** Tab delimited text file filtered for variants considered as neutral polymorphisms. -------------------------------------------------------------------------------------------------------------------------------------------------- **Example** Filter the following file:: Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups snp138 1000g2014oct_all esp6500si_all Strand context Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2 chr7 121717919 121717920 - G exonic AASS frameshift insertion AASS:NM_005763:exon23:c.2634dupC:p.A879fs NA rs147476318 NA NA - GCG chr7 121717919 121717920 - G chr1 230846235 230846235 T A exonic AGT nonsynonymous SNV AGT:NM_000029:exon2:c.A362T:p.H121L NA NA NA NA - GTG chr1 230846235 230846235 T A chr14 33290999 33290999 A G exonic AKAP6 nonsynonymous SNV AKAP6:NM_004274:exon13:c.A3980G:p.D1327G NA NA NA NA + GAC chr14 33290999 33290999 A G chr12 8082458 8082458 C T exonic SLC2A3 nonsynonymous SNV SLC2A3:NM_006931:exon6:c.G683A:p.R228Q NA rs200481428 0.000199681 NA - CCG chr12 8082458 8082458 C T chr4 70156391 70156391 T C exonic UGT2B28 nonsynonymous SNV UGT2B28:NM_053039:exon5:c.T1172C:p.V391A score=0.949699;Name=chr4:70035680 NA 0.000199681 NA + GTA chr4 70156391 70156391 T C Will produce:: Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups snp138 1000g2014oct_all esp6500si_all Strand context Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2 chr1 230846235 230846235 T A exonic AGT nonsynonymous SNV AGT:NM_000029:exon2:c.A362T:p.H121L NA NA NA NA - GTG chr1 230846235 230846235 T A chr14 33290999 33290999 A G exonic AKAP6 nonsynonymous SNV AKAP6:NM_004274:exon13:c.A3980G:p.D1327G NA NA NA NA + GAC chr14 33290999 33290999 A G chr4 70156391 70156391 T C exonic UGT2B28 nonsynonymous SNV UGT2B28:NM_053039:exon5:c.T1172C:p.V391A score=0.949699;Name=chr4:70035680 NA 0.000199681 NA + GTA chr4 70156391 70156391 T C -------------------------------------------------------------------------------------------------------------------------------------------------- **Contact** ardinm@fellows.iarc.fr; cahaisv@iarc.fr -------------------------------------------------------------------------------------------------------------------------------------------------- **Code** The source code is available on `GitHub`__ .. __: https://github.com/IARCbioinfo/mutspec.git </help> <citations> <citation type="bibtex"> @article{ardin_mutspec:_2016, title = {{MutSpec}: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes}, volume = {17}, issn = {1471-2105}, doi = {10.1186/s12859-016-1011-z}, shorttitle = {{MutSpec}}, abstract = {{BACKGROUND}: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called {MutSpec}. {RESULTS}: {MutSpec} includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the {COSMIC} database and other sources. {MutSpec} offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. {MutSpec} may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool. {CONCLUSIONS}: {MutSpec} offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. {MutSpec} can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.}, pages = {170}, number = {1}, journaltitle = {{BMC} Bioinformatics}, author = {Ardin, Maude and Cahais, Vincent and Castells, Xavier and Bouaoun, Liacine and Byrnes, Graham and Herceg, Zdenko and Zavadil, Jiri and Olivier, Magali}, date = {2016}, pmid = {27091472}, keywords = {Galaxy, Mutation signatures, Mutation spectra, Single base substitutions} } </citation> </citations> </tool>