comparison gbk_to_gff.xml @ 5:6e589f267c14

Uploaded
author devteam
date Tue, 04 Nov 2014 12:15:19 -0500
parents
children
comparison
equal deleted inserted replaced
4:619e0fcd9126 5:6e589f267c14
1 <tool id="fml_gbk2gff" name="GBK-to-GFF" version="2.0.0">
2 <description>converter</description>
3 <command interpreter="python">gbk_to_gff.py $inf_gbk &gt; $gff_format
4 </command>
5 <inputs>
6 <param format="gb,gbk,genbank,txt" name="inf_gbk" type="data" label="Convert this query" help="GenBank flat file format consists of an annotation section and a sequence section."/>
7 </inputs>
8 <outputs>
9 <data format="gff3" name="gff_format" label="${tool.name} on ${on_string}: Converted"/>
10 </outputs>
11 <tests>
12 <test>
13 <param name="inf_gbk" value="s_cerevisiae_SCU49845.gbk" />
14 <output name="gff_format" file="s_cerevisiae_SCU49845.gff3" />
15 </test>
16 </tests>
17 <help>
18
19 **What it does**
20
21 This tool converts data from a GenBank_ flat file format to GFF (scroll down for format description).
22
23 .. _GenBank: http://www.ncbi.nlm.nih.gov/genbank/
24
25 ------
26
27 **Example**
28
29 - The following data in GenBank format::
30
31 LOCUS NM_001202705 2406 bp mRNA linear PLN 28-MAY-2011
32 DEFINITION Arabidopsis thaliana thiamine biosynthesis protein ThiC (THIC)
33 mRNA, complete cds.
34 ACCESSION NM_001202705
35 VERSION NM_001202705.1 GI:334184566.........
36 FEATURES Location/Qualifiers
37 source 1..2406
38 /organism="Arabidopsis thaliana"
39 /mol_type="mRNA"
40 /db_xref="taxon:3702"........
41 gene 1..2406
42 /gene="THIC"
43 /locus_tag="AT2G29630"
44 /gene_synonym="PY; PYRIMIDINE REQUIRING; T27A16.27;........
45 ORIGIN
46 1 aagcctttcg ctttaggctg cattgggccg tgacaatatt cagacgattc aggaggttcg
47 61 ttcctttttt aaaggaccct aatcactctg agtaccactg actcactcag tgtgcgcgat
48 121 tcatttcaaa aacgagccag cctcttcttc cttcgtctac tagatcagat ccaaagcttc
49 181 ctcttccagc tatggctgct tcagtacact gtaccttgat gtccgtcgta tgcaacaaca
50 //
51
52
53 - Will be converted to GFF3::
54
55 ##gff-version 3
56 NM_001202705 gbk_to_gff chromosome 1 2406 . + 1 ID=NM_001202705;Alias=2;Dbxref=taxon:3702;Name=NM_001202705
57 NM_001202705 gbk_to_gff gene 1 2406 . + 1 ID=AT2G29630;Dbxref=GeneID:817513,TAIR:AT2G29630;Name=THIC
58 NM_001202705 gbk_to_gff mRNA 192 2126 . + 1 ID=AT2G29630.t01;Parent=AT2G29630
59 NM_001202705 gbk_to_gff CDS 192 2126 . + 1 ID=AT2G29630.p01;Parent=AT2G29630.t01
60 NM_001202705 gbk_to_gff exon 192 2126 . + 1 Parent=AT2G29630.t01
61
62 ------
63
64 **About formats**
65
66 **GenBank format** An example of a GenBank record may be viewed here_
67
68 .. _here: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
69
70 **GFF3** Generic Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF3 lines have nine tab-separated fields::
71
72 1. seqid - Must be a chromosome or scaffold or contig.
73 2. source - The program that generated this feature.
74 3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon".
75 4. start - The starting position of the feature in the sequence. The first base is numbered 1.
76 5. stop - The ending position of the feature (inclusive).
77 6. score - A score between 0 and 1000. If there is no score value, enter ".".
78 7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
79 8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
80 9. attributes - All lines with the same group are linked together into a single item.
81
82 --------
83
84 **Copyright**
85
86 2009-2014 Max Planck Society, University of Tübingen &amp; Memorial Sloan Kettering Cancer Center
87
88 Sreedharan VT, Schultheiss SJ, Jean G, Kahles A, Bohnert R, Drewe P, Mudrakarta P, Görnitz N, Zeller G, Rätsch G. Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. Bioinformatics 10.1093/bioinformatics/btt731 (2014)
89
90 </help>
91 </tool>