comparison gff_to_gbk.xml @ 5:6e589f267c14

Uploaded
author devteam
date Tue, 04 Nov 2014 12:15:19 -0500
parents
children
comparison
equal deleted inserted replaced
4:619e0fcd9126 5:6e589f267c14
1 <tool id="fml_gff2gbk" name="GFF-to-GBK" version="2.0.0">
2 <description>converter</description>
3 <command interpreter="python">gff_to_gbk.py $inf_gff $inf_fas $gbk_format
4 </command>
5 <inputs>
6 <param format="gff,gff3" name="inf_gff" type="data" label="Convert this query" help="Genome annotation in GFF file format."/>
7 <param format="fa,fasta" name="inf_fas" type="data" label="Genome Sequence" help="Genome sequence in FASTA format."/>
8 </inputs>
9 <outputs>
10 <data format="genbank" name="gbk_format" label="${tool.name} on ${on_string}: Converted"/>
11 </outputs>
12 <tests>
13 <test>
14 <param name="inf_gff" value="s_cerevisiae_SCU49845.gff3" />
15 <param name="inf_fas" value="s_cerevisiae_SCU49845.fasta" />
16 <output name="gbk_format" file="s_cerevisiae_SCU49845.gbk" />
17 </test>
18 </tests>
19 <help>
20
21 **What it does**
22
23 This tool converts annotations in GFF to GenBank_ format (scroll down for format description).
24
25 .. _GenBank: http://www.ncbi.nlm.nih.gov/genbank/
26
27 ------
28
29 **Example**
30
31 - The following data in GFF::
32
33 ##gff-version 3
34 # sequence-region NM_001202705 1 2406
35 NM_001202705 GenBank chromosome 1 2406 . + 1 ID=NM_001202705;Alias=2;Dbxref=taxon:3702;Name=NM_001202705;Note=Arabidopsis thaliana thiamine biosynthesis protein ThiC (THIC) mRNA%2C complete cds.,REVIEWED REFSEQ;
36 NM_001202705 GenBank gene 1 2406 . + 1 ID=AT2G29630;Dbxref=GeneID:817513,TAIR:AT2G29630;Name=THIC;locus_tag=AT2G29630
37 NM_001202705 GenBank mRNA 192 2126 . + 1 ID=AT2G29630.t01;Parent=AT2G29630
38 NM_001202705 GenBank CDS 192 2126 . + 1 ID=AT2G29630.p01;Parent=AT2G29630.t01;Dbxref=GI:334184567,GeneID:817513,TAIR:AT2G29630;Name=THIC;Note=thiaminC (THIC)%3B CONTAINS InterPro DOMAIN;rotein_id=NP_001189634.1;
39 NM_001202705 GenBank exon 192 2126 . + 1 Parent=AT2G29630.t01
40 ##FASTA
41 >NM_001202705
42 AAGCCTTTCGCTTTAGGCTGCATTGGGCCGTGACAATATTCAGACGATTCAGGAGGTTCG
43 TTCCTTTTTTAAAGGACCCTAATCACTCTGAGTACCACTGACTCACTCAGTGTGCGCGAT
44
45 - Will be converted to GenBank format::
46
47 LOCUS NM_001202705 2406 bp mRNA linear PLN 28-MAY-2011
48 DEFINITION Arabidopsis thaliana thiamine biosynthesis protein ThiC (THIC)
49 mRNA, complete cds.
50 ACCESSION NM_001202705
51 VERSION NM_001202705.1 GI:334184566.........
52 FEATURES Location/Qualifiers
53 source 1..2406
54 /organism="Arabidopsis thaliana"
55 /mol_type="mRNA"
56 /db_xref="taxon:3702"........
57 gene 1..2406
58 /gene="THIC"
59 /locus_tag="AT2G29630"
60 /gene_synonym="PY; PYRIMIDINE REQUIRING; T27A16.27;........
61 ORIGIN
62 1 aagcctttcg ctttaggctg cattgggccg tgacaatatt cagacgattc aggaggttcg
63 61 ttcctttttt aaaggaccct aatcactctg agtaccactg actcactcag tgtgcgcgat
64 121 tcatttcaaa aacgagccag cctcttcttc cttcgtctac tagatcagat ccaaagcttc
65 181 ctcttccagc tatggctgct tcagtacact gtaccttgat gtccgtcgta tgcaacaaca
66 //
67
68 ------
69
70 **About formats**
71
72 **GFF** Generic Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF lines have nine tab-separated fields::
73
74 1. seqid - Must be a chromosome or scaffold or contig.
75 2. source - The program that generated this feature.
76 3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon".
77 4. start - The starting position of the feature in the sequence. The first base is numbered 1.
78 5. stop - The ending position of the feature (inclusive).
79 6. score - A score between 0 and 1000. If there is no score value, enter ".".
80 7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
81 8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
82 9. attributes - All lines with the same group are linked together into a single item.
83
84 **GenBank format** Consists of an annotation section and a sequence section. Sample record_
85
86 .. _record: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
87
88
89 --------
90
91 **Copyright**
92
93 2010-2014 Max Planck Society, University of Tübingen &amp; Memorial Sloan Kettering Cancer Center
94
95 Sreedharan VT, Schultheiss SJ, Jean G, Kahles A, Bohnert R, Drewe P, Mudrakarta P, Görnitz N, Zeller G, Rätsch G. Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. Bioinformatics 10.1093/bioinformatics/btt731 (2014)
96
97 </help>
98 </tool>