annotate readme.rst @ 6:9d5515db5920 draft default tip

Uploaded
author bgruening
date Fri, 23 Aug 2013 02:54:15 -0400
parents ad01b12e0a0c
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
1 This package is a Galaxy workflow for gene prediction using Glimmer3.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
2
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
3 It uses the Glimmer3 tool (Delcher et al. 2007) trained on a known set of
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
4 genes to generate gene predictions on a new genome, and then calls EMBOSS
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
5 (Rice et al. 2000) to translate the predictions into a FASTA file of
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
6 predicted protein sequences. The workflow requires two input files:
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
7
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
8 * Nucleotide FASTA file of know gene sequences (training set)
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
9 * Nucleotide FASTA file of genome sequence or assembled contigs
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
10
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
11 First an interpolated context model (ICM) is built from the set of known
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
12 genes, preferably from the closest relative organism(s) available. Next this
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
13 ICM model is used to predict genes on the genomic FASTA file. This produces
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
14 a FASTA file of the predicted gene nucleotide sequences, which is translated
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
15 into protein sequences using the EMBOSS tool transeq.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
16
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
17 Glimmer is intended for finding genes in microbial DNA, especially bacteria,
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
18 archaea, and viruses.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
19
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
20 See http://www.galaxyproject.org for information about the Galaxy Project.
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
21
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
22
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
23 Sample Data
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
24 ===========
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
25
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
26 As an example, we will use the first public assembly of the 2011 Shiga-toxin
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
27 producing *Escherichia coli* O104:H4 outbreak in Germany. This was part of the
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
28 open-source crowd-sourcing analysis described in Rohde et al. (2011) and here:
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
29 https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
30
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
31 You can upload this assembly directly into Galaxy using the "Upload File" tool
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
32 with either of these URLs - Galaxy should recognise this is a FASTA file with
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
33 3,057 sequences:
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
34
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
35 * http://static.xbase.ac.uk/files/results/nick/TY2482/TY2482.fasta.txt
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
36 * https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/blob/master/strains/TY2482/seqProject/BGI/assemblies/NickLoman/TY2482.fasta.txt
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
37
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
38 This FASTA file ``TY2482.fasta.txt`` was the initial TY-2482 strain assembled
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
39 by Nick Loman from 5 runs of Ion Torrent data released by the BGI, using the
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
40 MIRA 3.2 assembler. It was initially released via his blog,
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
41 http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
42
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
43 We will also need a training set of known *E. coli* genes, for example the
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
44 model strain *Escherichia coli* str. K-12 substr. MG1655 which is well
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
45 annotated. You can upload the NCBI FASTA file ``NC_000913.ffn`` of the
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
46 gene nucleotide sequences directly into Galaxy via this URL, which Galaxy
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
47 should recognise as a FASTA file with 4,321 sequences:
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
48
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
49 * ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.ffn
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
50
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
51 Then run the workflow, which should produce 2,333 predicted genes for the
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
52 TY2482 assembly (two FASTA files, nucleotide and protein sequences).
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
53
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
54
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
55 Citation
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
56 ========
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
57
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
58 If you use this workflow directly, or a derivative of it, or the associated
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
59 Glimmer wrappers for Galaxy, in work leading to a scientific publication,
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
60 please cite:
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
61
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
62 Cock, P.J.A., GrĂ¼ning, B., Paszkiewicz, K. and Pritchard, L. (2013)
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
63 Galaxy tools and workflows for sequence analysis with applications in
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
64 molecular plant pathology. (Submitted).
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
65
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
66 For Glimmer3 please cite:
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
67
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
68 Delcher, A.L., Bratke, K.A., Powers, E.C., and Salzberg, S.L. (2007)
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
69 Identifying bacterial genes and endosymbiont DNA with Glimmer.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
70 Bioinformatics 23(6), 673-679.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
71 http://dx.doi.org/10.1093/bioinformatics/btm009
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
72
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
73 For EMBOSS please cite:
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
74
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
75 Rice, P., Longden, I. and Bleasby, A. (2000)
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
76 EMBOSS: The European Molecular Biology Open Software Suite
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
77 Trends in Genetics 16(6), 276-277.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
78 http://dx.doi.org/10.1016/S0168-9525(00)02024-2
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
79
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
80
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
81 Additional References
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
82 =====================
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
83
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
84 Rohde, H., Qin, J., Cui, Y., Li, D., Loman, N.J., et al. (2011)
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
85 Open-source genomic analysis of shiga-toxin-producing E. coli O104:H4.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
86 New England Journal of Medicine 365, 718-724.
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
87 http://dx.doi.org/10.1056/NEJMoa1107643
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
88
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
89
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
90 Availability
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
91 ============
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
92
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
93 This workflow is available on the main Galaxy Tool Shed:
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
94
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
95 http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer_gene_calling_workflow
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
96
2
ad01b12e0a0c Uploaded
bgruening
parents: 0
diff changeset
97 Development is being done on github:
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
98
0
03f5132065e2 Uploaded
bgruening
parents:
diff changeset
99 https://github.com/bgruening/galaxytools/workflows/glimmer3/
6
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
100
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
101
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
102 Dependencies
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
103 ============
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
104
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
105 These dependencies should be resolved automatically via the Galaxy Tool Shed:
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
106
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
107 * http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer3
9d5515db5920 Uploaded
bgruening
parents: 2
diff changeset
108 * http://toolshed.g2.bx.psu.edu/view/devteam/emboss_5