What it does
This tool converts data from BED format to GFF format (scroll down for format description).
Example
The following data in BED format:
chr28 346187 388197 BC114771 0 + 346187 388197 0 9 144,81,115,63,155,96,134,105,112, 0,24095,26190,31006,32131,33534,36994,41793,41898,
Will be converted to GFF (note that the start coordinate is incremented by 1):
chr28 bed2gff mRNA 346188 388197 0 + . mRNA BC114771; chr28 bed2gff exon 346188 346331 0 + . exon BC114771; chr28 bed2gff exon 370283 370363 0 + . exon BC114771; chr28 bed2gff exon 372378 372492 0 + . exon BC114771; chr28 bed2gff exon 377194 377256 0 + . exon BC114771; chr28 bed2gff exon 378319 378473 0 + . exon BC114771; chr28 bed2gff exon 379722 379817 0 + . exon BC114771; chr28 bed2gff exon 383182 383315 0 + . exon BC114771; chr28 bed2gff exon 387981 388085 0 + . exon BC114771; chr28 bed2gff exon 388086 388197 0 + . exon BC114771;
About formats
BED format Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. It has three required fields and several additional optional ones:
The first three BED fields (required) are:
1. chrom - The name of the chromosome (e.g. chr1, chrY_random). 2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.) 3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval).
The additional BED fields (optional) are:
4. name - The name of the BED line. 5. score - A score between 0 and 1000. 6. strand - Defines the strand - either '+' or '-'. 7. thickStart - The starting position where the feature is drawn thickly at the Genome Browser. 8. thickEnd - The ending position where the feature is drawn thickly at the Genome Browser. 9. reserved - This should always be set to zero. 10. blockCount - The number of blocks (exons) in the BED line. 11. blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount. 12. blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount. 13. expCount - The number of experiments. 14. expIds - A comma-separated list of experiment ids. The number of items in this list should correspond to expCount. 15. expScores - A comma-separated list of experiment scores. All of the expScores should be relative to expIds. The number of items in this list should correspond to expCount.
GFF format General Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF lines have nine tab-separated fields:
1. seqname - Must be a chromosome or scaffold. 2. source - The program that generated this feature. 3. feature - The name of this type of feature. Some examples of standard feature types are "CDS", "start_codon", "stop_codon", and "exon". 4. start - The starting position of the feature in the sequence. The first base is numbered 1. 5. end - The ending position of the feature (inclusive). 6. score - A score between 0 and 1000. If there is no score value, enter ".". 7. strand - Valid entries include '+', '-', or '.' (for don't know/care). 8. frame - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'. 9. group - All lines with the same group are linked together into a single item.