Galaxy | Tool Preview

Genbank to GFF3 (version 1.1)
E.g. chromosome, region, contig

What it does:

This tool uses Bio::SeqFeature::Tools::Unflattener and Bio::Tools::GFF to convert GenBank flatfiles to GFF3 with gene containment hierarchies mapped for optimal display in gbrowse.

The input files are assumed to be gzipped GenBank flatfiles for refseq contigs. The files may contain multiple GenBank records.

Designed for RefSeq

This script is designed for RefSeq genomic sequence entries. It may work for third party annotations but this has not been tested. But see below, Uniprot/Swissprot works, EMBL and possibly EMBL/Ensembl if you don't mind some gene model unflattener errors (dgg).

G-R-P-E Gene Model

Don Gilbert worked this over with needs to produce GFF3 suited to loading to GMOD Chado databases.

This writes GFF with an alternate, but useful Gene model, instead of the consensus model for GFF3

[ gene > mRNA> (exon,CDS,UTR) ]

This alternate is

gene > mRNA > polypeptide > exon

means the only feature with dna bases is the exon. The others specify only location ranges on a genome. Exon of course is a child of mRNA and protein/peptide.

The protein/polypeptide feature is an important one, having all the annotations of the GenBank CDS feature, protein ID, translation, GO terms, Dbxrefs to other proteins.

UTRs, introns, CDS-exons are all inferred from the primary exon bases inside/outside appropriate higher feature ranges. Other special gene model features remain the same.

Authors

Sheldon McKay (mckays@cshl.edu)

Copyright (c) 2004 Cold Spring Harbor Laboratory.

Author of hacks for GFF2Chado loading

Don Gilbert (gilbertd@indiana.edu)