Galaxy | Tool Preview

Get Haplotypes From Phased VCF (version 2.0.0)

Authors Dereeper Alexis (alexis.dereeper@ird.fr), IRD, South Green platform

Please cite "SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations", Dereeper A. et al., Nucl. Acids Res. (1 july 2015) 43 (W1).

Galaxy integration Provided by Southgreen & Andres Gwendoline (Institut Français de Bioinformatique) & Marcon Valentin (IFB & INRA)

Support For any questions about Galaxy integration, please send an e-mail to alexis.dereeper@ird.fr


Get Haplotypes From Phased VCF

Description

Get Haplotype from phased VCF

Input file

VCF file
Phased VCF file

Parameter

Output file basename
Prefix for the output VCF file

Output files

Distinct Haplotypes text file
File describing distincts haplotypes
Fasta file
Fasta file with haplotypes
Distinct Haplotypes fasta file
Fasta file with distincts haplotypes

Working example

Input file

VCF file

#fileformat=VCFv4.1
#FILTER=<ID=LowQual,Description="Low quality">
#FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
[...]
CHROM   POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AZUCENA
Chr1    4299    .       G       A       .       PASS    AR2=1;DR2=1;AF=0.168    GT:DS:GP        0|0:0:1,0,0

Parameter

Output name -> haplotypes

Output files

haplotypes.distinct_haplotypes.txt

===Chr10===
haplo1:2:CIRAD403_1,CIRAD403_2,
TTTAAGAAATTCCTATATAGGTCTTCTAAGCGTATCTATTAACAT
haplo2:2:MAHAE_1,MAHAE_2,
TAAATCTTGGTGCTGATCTGATATTTAATGCGT

haplotypes.haplo.fas

>Chr10_AZUCENA_1
TTTAAGAAATTCCTATATAGGTCTTCTAAGCGTATCTATTAACAT
>Chr10_AZUCENA_2
TAAATCTTGGTGCTGATCTGATATTTAATGCGT

haplotypes.distinct_haplotypes.fas

>haplo1|2
CAATTTATATATACTTGTATATAACCACAACGAGAGAGTTTTACCT
TTTATAAAAAATAAATAATGTATTACGGCTAATATAGCAATCTTTT
AAAATAAATCTATATTTAAATGACTATGGAATTACTAATCACAATA
ACAGGATCTTGTTATTTTTAGCTTGTGTACTTATAATGATCCGATG
>haplo2|2
GCTACTTAAATATCTAGCATTAATCCACAACGAGAGGCTCTTACCT
TTAAAAAAGGGTCATCGCCTATAGGTTAGATAATCGACACATATAA
TTATAAGAAATTATATATAATTTTTAATCTAGTTCATTCTTGTGCA
TCATTATGTTATATAATAATAAACGTAACAAATATTGATACTACTC