comparison gbk_to_gff.xml @ 10:c42c69aa81f8

fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
author vipints <vipin@cbio.mskcc.org>
date Thu, 23 Apr 2015 18:01:45 -0400
parents
children 5c6f33e20fcc
comparison
equal deleted inserted replaced
9:7d67331368f3 10:c42c69aa81f8
1 <tool id="fml_gbk2gff" name="GBK-to-GFF" version="2.1.0">
2 <description>converter</description>
3 <command interpreter="python">gbk_to_gff.py $inf_gbk &gt; $gff_format
4 </command>
5 <inputs>
6 <param format="gb,gbk,genbank" name="inf_gbk" type="data" label="Convert this query" help="GenBank flat file format consists of an annotation section and a sequence section."/>
7 </inputs>
8 <outputs>
9 <data format="gff" name="gff_format" label="${tool.name} on ${on_string}: Converted"/>
10 </outputs>
11 <tests>
12 <test>
13 <param name="inf_gbk" value="s_cerevisiae_SCU49845.gbk" />
14 <output name="gff_format" file="s_cerevisiae_SCU49845.gff" />
15 </test>
16 </tests>
17 <help>
18
19 **What it does**
20
21 This tool converts data from a GenBank_ flat file format to GFF (scroll down for format description).
22
23 .. _GenBank: http://www.ncbi.nlm.nih.gov/genbank/
24
25 ------
26
27 **Example**
28
29 - The following data in GenBank format::
30
31 LOCUS NM_001202705 2406 bp mRNA linear PLN 28-MAY-2011
32 DEFINITION Arabidopsis thaliana thiamine biosynthesis protein ThiC (THIC)
33 mRNA, complete cds.
34 ACCESSION NM_001202705
35 VERSION NM_001202705.1 GI:334184566.........
36 FEATURES Location/Qualifiers
37 source 1..2406
38 /organism="Arabidopsis thaliana"
39 /mol_type="mRNA"
40 /db_xref="taxon:3702"........
41 gene 1..2406
42 /gene="THIC"
43 /locus_tag="AT2G29630"
44 /gene_synonym="PY; PYRIMIDINE REQUIRING; T27A16.27;........
45 ORIGIN
46 1 aagcctttcg ctttaggctg cattgggccg tgacaatatt cagacgattc aggaggttcg
47 61 ttcctttttt aaaggaccct aatcactctg agtaccactg actcactcag tgtgcgcgat
48 121 tcatttcaaa aacgagccag cctcttcttc cttcgtctac tagatcagat ccaaagcttc
49 181 ctcttccagc tatggctgct tcagtacact gtaccttgat gtccgtcgta tgcaacaaca
50 //
51
52
53 - Will be converted to GFF3::
54
55 NM_001202705 gbk2gff chromosome 1 2406 . + 1 ID=NM_001202705;Alias=2;Dbxref=taxon:3702;Name=NM_001202705
56 NM_001202705 gbk2gff gene 1 2406 . + 1 ID=AT2G29630;Dbxref=GeneID:817513,TAIR:AT2G29630;Name=THIC
57 NM_001202705 gbk2gff mRNA 192 2126 . + 1 ID=AT2G29630.t01;Parent=AT2G29630
58 NM_001202705 gbk2gff CDS 192 2126 . + 1 ID=AT2G29630.p01;Parent=AT2G29630.t01
59 NM_001202705 gbk2gff exon 192 2126 . + 1 Parent=AT2G29630.t01
60
61 ------
62
63 **Reference**
64
65 **GBK-to-GFF** is part of oqtans package and cited as [1]_.
66
67 .. [1] Sreedharan VT, Schultheiss SJ, Jean G et.al., Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. Bioinformatics (2014). `10.1093/bioinformatics/btt731`_
68
69 .. _10.1093/bioinformatics/btt731: http://goo.gl/I75poH
70
71 ------
72
73 **About file formats**
74
75 **GenBank format** An example of a GenBank record may be viewed here_
76
77 .. _here: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
78
79 **GFF** Generic Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF lines have nine tab-separated fields::
80
81 1. seqid - Must be a chromosome or scaffold or contig.
82 2. source - The program that generated this feature.
83 3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon".
84 4. start - The starting position of the feature in the sequence. The first base is numbered 1.
85 5. stop - The ending position of the feature (inclusive).
86 6. score - A score between 0 and 1000. If there is no score value, enter ".".
87 7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
88 8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
89 9. attributes - All lines with the same group are linked together into a single item.
90
91 --------
92
93 **Copyright**
94
95 GBK-to-GFF Wrapper Version 0.6 (Apr 2015)
96
97 2009-2015 Max Planck Society, University of Tübingen &amp; Memorial Sloan Kettering Cancer Center
98
99 </help>
100 </tool>