snpsift: snpSift_extractFields.xml comparison

comparison snpSift_extractFields.xml @ 1:98708b88af9f draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tool_collections/snpsift/snpsift commit 21b46ae2c90ba7e569b2b3a9eaf938f8dedb2c31

author	iuc
date	Tue, 07 Jun 2016 10:04:09 -0400
parents
children	bf8c1526871b

comparison

equal deleted inserted replaced

-:9e8280e19338
+:98708b88af9f
+<tool id="snpSift_extractFields" name="SnpSift Extract Fields" version="@WRAPPER_VERSION@.0">
+<options sanitize="False" />
+<description>from a VCF file inot a tabular file</description>
+<macros>
+<import>snpSift_macros.xml</import>
+</macros>
+<expand macro="requirements" />
+<expand macro="stdio" />
+<expand macro="version_command" />
+<command><![CDATA[
+cat "$input"
+#if $one_effect_per_line:
+| "\$SNPEFF_JAR_PATH/scripts/vcfEffOnePerLine.pl"
+#end if
+| java -Xmx6G -jar "\$SNPEFF_JAR_PATH/SnpSift.jar" extractFields
+#if $separator:
+-s '$separator'
+#end if
+#if $empty_text:
+-e '$empty_text'
+#end if
+-
+#echo ' '.join(['"%s"' % x for x in $extract.split()])
+> "$output"
+]]>
+</command>
+<inputs>
+<param format="vcf" name="input" type="data" label="Variant input file in VCF format"/>
+<param name="extract" type="text" label="Extract" help="Need help? See below a few examples." />
+<param name="one_effect_per_line" type="boolean" truevalue="yes" falsevalue="no" checked="false" label="One effect per line" help="When variants have more than one effect, lists one effect per line, while all other parameters in the line are repeated across mutiple lines" />
+<param name="separator" type="text" value="" optional="true" label="multiple field separator" help="Separate multiple fields in one column with this character, e.g. a comma, rather than a column for each of the multiple values">
+</param>
+<param name="empty_text" type="text" value="" optional="true" label="empty field text" help="Represent empty fields with this value, rather than leaving them blank" >
+</param>
+</inputs>
+<outputs>
+<data format="tabular" name="output" />
+</outputs>
+<tests>
+<test>
+<param name="input" ftype="vcf" value="test_rmInfo.vcf"/>
+<param name="extract" value="CHROM POS REF ALT EFF[*].EFFECT"/>
+<output name="output">
+<assert_contents>
+<has_text text="INTRAGENIC" />
+<not_has_text text="DOWNSTREAM,INTRAGENIC,INTRON,UTR_3_PRIME" />
+</assert_contents>
+</output>
+</test>
+<test>
+<param name="input" ftype="vcf" value="test_rmInfo.vcf"/>
+<param name="extract" value="CHROM POS REF ALT EFF[*].EFFECT"/>
+<param name="separator" value=","/>
+<output name="output">
+<assert_contents>
+		<has_text text="DOWNSTREAM,INTRAGENIC,INTRON,UTR_3_PRIME" />
+</assert_contents>
+</output>
+</test>
+</tests>
+<help><![CDATA[
+**SnpSift Extract Fields**
+Extract fields from a VCF file to a TXT, tab separated format, that you can easily load in R, XLS, etc.
+http://snpeff.sourceforge.net/SnpSift.html#Extract
+You can also use sub-fields and genotype fields / sub-fields such as:
+::
+Standard VCF fields:
+CHROM
+POS
+ID
+REF
+ALT
+FILTER
+INFO fields:
+AF
+AC
+DP
+MQ
+etc. (any info field available)
+SnpEff 'ANN' fields:
+"ANN[*].ALLELE" (alias GENOTYPE)
+"ANN[*].EFFECT" (alias ANNOTATION): Effect in Sequence ontology terms (e.g. 'missense_variant', 'synonymous_variant', 'stop_gained', etc.)
+"ANN[*].IMPACT" { HIGH, MODERATE, LOW, MODIFIER }
+"ANN[*].GENE" Gene name (e.g. 'PSD3')
+"ANN[*].GENEID" Gene ID
+"ANN[*].FEATURE"
+"ANN[*].FEATUREID" (alias TRID: Transcript ID)
+"ANN[*].BIOTYPE" Biotype, as described by the annotations (e.g. 'protein_coding')
+"ANN[*].RANK" Exon or Intron rank (i.e. exon number in a transcript)
+"ANN[*].HGVS_C" (alias HGVS_DNA, CODON): Variant in HGVS (DNA) notation
+"ANN[*].HGVS_P" (alias HGVS, HGVS_PROT, AA): Variant in HGVS (protein) notation
+"ANN[*].CDNA_POS" (alias POS_CDNA)
+"ANN[*].CDNA_LEN" (alias LEN_CDNA)
+"ANN[*].CDS_POS" (alias POS_CDS)
+"ANN[*].CDS_LEN" (alias LEN_CDS)
+"ANN[*].AA_POS" (alias POS_AA)
+"ANN[*].AA_LEN" (alias LEN_AA)
+"ANN[*].DISTANCE"
+"ANN[*].ERRORS" (alias WARNING, INFOS)
+SnpEff 'EFF' fields (this is for older SnpEff/SnpSift versions, new version use 'ANN' field):
+"EFF[*].EFFECT"
+"EFF[*].IMPACT"
+"EFF[*].FUNCLASS"
+"EFF[*].CODON"
+"EFF[*].AA"
+"EFF[*].AA_LEN"
+"EFF[*].GENE"
+"EFF[*].BIOTYPE"
+"EFF[*].CODING"
+"EFF[*].TRID"
+"EFF[*].RANK"
+SnpEff 'LOF' fields:
+"LOF[*].GENE"
+"LOF[*].GENEID"
+"LOF[*].NUMTR"
+"LOF[*].PERC"
+SnpEff' NMD' fields:
+"NMD[*].GENE"
+"NMD[*].GENEID"
+"NMD[*].NUMTR"
+"NMD[*].PERC"
+Some examples:
+- *Extracting chromosome, position, ID and allele frequency from a VCF file:*
+**CHROM POS ID AF**
+The result will look something like:
+::
+#CHROM        POS        ID            AF
+1             69134                    0.086
+1             69496      rs150690004   0.001
+- *Extracting genotype fields:*
+**CHROM POS ID THETA GEN[0].GL[1] GEN[1].GL GEN[3].GL[*] GEN[*].GT**
+This means to extract:
+- CHROM POS ID: regular fields (as in the previous example)
+- THETA : This one is from INFO
+- GEN[0].GL[1] : Second likelihood from first genotype
+- GEN[1].GL : The whole GL fiels (all entries without separating them)
+- GEN[3].GL[*] : All likelihoods form genotype 3 (this time they will be tab separated, as opposed to the previous one).
+- GEN[*].GT : Genotype subfields (GT) from ALL samples (tab separated).
+The result will look something like:
+::
+#CHROM  POS     ID              THETA   GEN[0].GL[1]    GEN[1].GL               GEN[3].GL[*]            GEN[*].GT
+1       10583   rs58108140      0.0046  -0.47           -0.24,-0.44,-1.16       -0.48   -0.48   -0.48   0|0     0|0     0|0     0|1     0|0     0|1     0|0     0|0     0|1
+1       10611   rs189107123     0.0077  -0.48           -0.24,-0.44,-1.16       -0.48   -0.48   -0.48   0|0     0|1     0|0     0|0     0|0     0|0     0|0     0|0     0|0
+1       13302   rs180734498     0.0048  -0.58           -2.45,-0.00,-5.00       -0.48   -0.48   -0.48   0|0     0|1     0|0     0|0     0|0     1|0     0|0     0|1     0|0
+- *Extracting fields with multiple values:*
+(notice that there are multiple effect columns per line because there are mutiple effects per variant)
+**CHROM POS REF ALT ANN[*].EFFECT**
+The result will look something like:
+::
+#CHROM	POS	REF	ALT	ANN[*].EFFECT
+22	17071756	T	C	3_prime_UTR_variant	downstream_gene_variant
+22	17072035	C	T	missense_variant	downstream_gene_variant
+22	17072258	C	A	missense_variant	downstream_gene_variant
+- *Extracting fields with multiple values using a comma as a multipe field separator:*
+**CHROM POS REF ALT ANN[*].EFFECT ANN[*].HGVS_P**
+The result will look something like:
+::
+#CHROM	POS	REF	ALT	ANN[*].EFFECT	ANN[*].HGVS_P
+22	17071756	T	C	3_prime_UTR_variant,downstream_gene_variant	.,.
+22	17072035	C	T	missense_variant,downstream_gene_variant	p.Gly469Glu,.
+22	17072258	C	A	missense_variant,downstream_gene_variant	p.Gly395Cys,.
+- *Extracting fields with multiple values, one effect per line:*
+**CHROM POS REF ALT ANN[*].EFFECT**
+The result will look something like:
+::
+#CHROM	POS	REF	ALT	ANN[*].EFFECT
+22	17071756	T	C	3_prime_UTR_variant
+22	17071756	T	C	downstream_gene_variant
+22	17072035	C	T	missense_variant
+22	17072035	C	T	downstream_gene_variant
+22	17072258	C	A	missense_variant
+22	17072258	C	A	downstream_gene_variant
+@EXTERNAL_DOCUMENTATION@
+	http://snpeff.sourceforge.net/SnpSift.html#Extract
+]]>
+</help>
+<expand macro="citations" />
+</tool>

Mercurial > repos > iuc > snpsift

comparison snpSift_extractFields.xml @ 1:98708b88af9f draft