What it does
This tool converts data from GFF format to BED format (scroll down for format description).
Example
The following data in GFF format:
chr22 GeneA enhancer 10000000 10001000 500 + . TGA chr22 GeneA promoter 10010000 10010100 900 + . TGA
Will be converted to BED (note that 1 is subtracted from the start coordinate):
chr22 9999999 10001000 enhancer 0 + chr22 10009999 10010100 promoter 0 +
About formats
BED format Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. It has three required fields and several additional optional ones:
The first three BED fields (required) are:
1. chrom - The name of the chromosome (e.g. chr1, chrY_random). 2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.) 3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval).
The additional BED fields (optional) are:
4. name - The name of the BED line. 5. score - A score between 0 and 1000. 6. strand - Defines the strand - either '+' or '-'. 7. thickStart - The starting position where the feature is drawn thickly at the Genome Browser. 8. thickEnd - The ending position where the feature is drawn thickly at the Genome Browser. 9. reserved - This should always be set to zero. 10. blockCount - The number of blocks (exons) in the BED line. 11. blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount. 12. blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount. 13. expCount - The number of experiments. 14. expIds - A comma-separated list of experiment ids. The number of items in this list should correspond to expCount. 15. expScores - A comma-separated list of experiment scores. All of the expScores should be relative to expIds. The number of items in this list should correspond to expCount.
GFF format General Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF lines have nine tab-separated fields:
1. seqname - Must be a chromosome or scaffold. 2. source - The program that generated this feature. 3. feature - The name of this type of feature. Some examples of standard feature types are "CDS", "start_codon", "stop_codon", and "exon". 4. start - The starting position of the feature in the sequence. The first base is numbered 1. 5. end - The ending position of the feature (inclusive). 6. score - A score between 0 and 1000. If there is no score value, enter ".". 7. strand - Valid entries include '+', '-', or '.' (for don't know/care). 8. frame - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'. 9. group - All lines with the same group are linked together into a single item.