Galaxy |

What it does

This tool converts data from a 12 column UCSC wiggle BED format to GFF3 (scroll down for format description).

Example

The following data in UCSC Wiggle BED format:

chr1    11873   14409   uc001aaa.3      0       +       11873   11873   0       3       354,109,1189,   0,739,1347,

Will be converted to GFF3:

chr1    bed2gff gene    11874   14409   0       +       .       ID=Gene:uc001aaa.3;Name=Gene:uc001aaa.3
chr1    bed2gff transcript      11874   14409   0       +       .       ID=uc001aaa.3;Name=uc001aaa.3;Parent=Gene:uc001aaa.3
chr1    bed2gff exon    11874   12227   0       +       .       Parent=uc001aaa.3
chr1    bed2gff exon    12613   12721   0       +       .       Parent=uc001aaa.3
chr1    bed2gff exon    13221   14409   0       +       .       Parent=uc001aaa.3

Reference

BED-to-GFF is part of oqtans package and cited as [1].

[1]	Sreedharan VT, Schultheiss SJ, Jean G et.al., Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. Bioinformatics (2014). 10.1093/bioinformatics/btt731

About file formats

BED format Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. It has three required fields and several additional optional ones:

The first three BED fields (required) are:

1. chrom - The name of the chromosome (e.g. chr1, chrY_random).
2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.)
3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval).

The additional BED fields (optional) are:

 4. name - The name of the BED line.
 5. score - A score between 0 and 1000.
 6. strand - Defines the strand - either '+' or '-'.
 7. thickStart - The starting position where the feature is drawn thickly at the Genome Browser.
 8. thickEnd - The ending position where the feature is drawn thickly at the Genome Browser.
 9. reserved - This should always be set to zero.
10. blockCount - The number of blocks (exons) in the BED line.
11. blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.
12. blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.

GFF format General Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF lines have nine tab-separated fields:

1. seqid - Must be a chromosome or scaffold or contig.
2. source - The program that generated this feature.
3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon".
4. start - The starting position of the feature in the sequence. The first base is numbered 1.
5. stop - The ending position of the feature (inclusive).
6. score - A score between 0 and 1000. If there is no score value, enter ".".
7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
9. attributes - All lines with the same group are linked together into a single item.

Copyright

BED-to-GFF Wrapper Version 0.6 (Apr 2015)

2009-2015 Max Planck Society, University of Tübingen & Memorial Sloan Kettering Cancer Center