What it does
This tool converts data from a 12 column UCSC wiggle BED format to GFF3 (scroll down for format description).
Example
The following data in UCSC Wiggle BED format:
chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,109,1189, 0,739,1347,
Will be converted to GFF3:
chr1 bed2gff gene 11874 14409 0 + . ID=Gene:uc001aaa.3;Name=Gene:uc001aaa.3 chr1 bed2gff transcript 11874 14409 0 + . ID=uc001aaa.3;Name=uc001aaa.3;Parent=Gene:uc001aaa.3 chr1 bed2gff exon 11874 12227 0 + . Parent=uc001aaa.3 chr1 bed2gff exon 12613 12721 0 + . Parent=uc001aaa.3 chr1 bed2gff exon 13221 14409 0 + . Parent=uc001aaa.3
Reference
BED-to-GFF is part of oqtans package and cited as [1].
[1] | Sreedharan VT, Schultheiss SJ, Jean G et.al., Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. Bioinformatics (2014). 10.1093/bioinformatics/btt731 |
About file formats
BED format Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. It has three required fields and several additional optional ones:
The first three BED fields (required) are:
1. chrom - The name of the chromosome (e.g. chr1, chrY_random). 2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.) 3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval).
The additional BED fields (optional) are:
4. name - The name of the BED line. 5. score - A score between 0 and 1000. 6. strand - Defines the strand - either '+' or '-'. 7. thickStart - The starting position where the feature is drawn thickly at the Genome Browser. 8. thickEnd - The ending position where the feature is drawn thickly at the Genome Browser. 9. reserved - This should always be set to zero. 10. blockCount - The number of blocks (exons) in the BED line. 11. blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount. 12. blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.
GFF format General Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF lines have nine tab-separated fields:
1. seqid - Must be a chromosome or scaffold or contig. 2. source - The program that generated this feature. 3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon". 4. start - The starting position of the feature in the sequence. The first base is numbered 1. 5. stop - The ending position of the feature (inclusive). 6. score - A score between 0 and 1000. If there is no score value, enter ".". 7. strand - Valid entries include '+', '-', or '.' (for don't know/care). 8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'. 9. attributes - All lines with the same group are linked together into a single item.
Copyright
BED-to-GFF Wrapper Version 0.6 (Apr 2015)
2009-2015 Max Planck Society, University of Tübingen & Memorial Sloan Kettering Cancer Center