comparison readme.md @ 0:db68266f7364 draft

Uploaded
author bgruening
date Tue, 24 Feb 2015 04:48:44 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:db68266f7364
1 Galaxy workflow for the identification of candidate genes clusters
2 ------------------------------------------------------------------
3
4 This approach screens two proteins against all nucleotide sequence from the
5 NCBI nt database within hours on our cluster, leading to all organisms with an inter-
6 esting gene structure for further investigation. As usual in Galaxy workflows every
7 parameter, including the proximity distance, can be changed and additional steps
8 can be easily added. For example additional filtering to refine the initial BLAST
9 hits, or inclusion of a third query sequence.
10
11 ![Workflow Image](https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/find_genes_located_nearby.png)
12
13
14 Sample Data
15 ===========
16
17 As an example, we will use two protein sequences from *Streptomyces aurantiacus*
18 that are part of a gene cluster, responsible for metabolite producion.
19
20 You can upload both sequences directly into Galaxy using the "Upload File" tool
21 with either of these URLs - Galaxy should recognise this is FASTA files.
22
23 * https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658548.fasta
24 * https://raw.githubusercontent.com/bgruening/galaxytools/master/workflows/ncbi_blast_plus/find_genes_located_nearby/WP_037658557.fasta
25
26 In addition you can find both sequences at the NCBI server:
27 * http://www.ncbi.nlm.nih.gov/protein/739806622 (cytochrome P450)
28
29 ```text
30 >gi|739806622|ref|WP_037658557.1| cytochrome P450 [Streptomyces aurantiacus]
31 MQRTCPFSVPPVYTKFREESPITQVVLPDGGKAWLVTKYDDVRAVMANPKLSSDRRAPDFPVVVPGQNAA
32 LAKHAPFMIILDGAEHAAARRPVISEFSVRRVAAMKPRIQEIVDGFIDDMLKMPKPVDLNQVFSLPVPSL
33 VVSEILGMPYEGHEYFMELAEILLRRTTDEQGRIAVSVELRKYMDKLVEEKIENPGDDLLSRQIELQRQQ
34 GGIDRPQLASLCLLVLLAGHETTANMINLGVFSMLTKPELLAEIKADPSKTPKAVDELLRFYTIPDFGAH
35 RLALDDVEIGGVLIRKGEAVIASTFAANRDPAVFDDPEELDFGRDARHHVAFGYGPHQCLGQNLGRLELQ
36 VVFDTLFRRLPELRLAVPEEELSFKSDALVYGLYELPVTW
37 ```
38
39 * http://www.ncbi.nlm.nih.gov/protein/739806613 (beta-ACP synthase)
40
41 ```
42 >gi|739806613|ref|WP_037658548.1| beta-ACP synthase [Streptomyces aurantiacus]
43 MSGRRVVVTGMEVLAPGGVGTDNFWSLLSEGRTATRGITFFDPAQFRSRVAAEIDFDPYAHGLTPQEVRR
44 MDRAAQFAVVAARGAVADSGLDTDTLDPYRIGVTIGSAVGATMSLDEDYRVVSDAGRLDLVDHTYADPFF
45 YNYFVPSSFATEVARLVGAQGPSSVVSAGCTSGLDSVGYAVELIREGTADVMVAGATDAPISPITMACFD
46 AIKATTPRHDDPEHASRPFDDTRNGFVLGEGTAVFVLEELESARRRGARIYAEIAGYATRSNAYHMTGLR
47 PDGAEMAEAITVALDEARMNPTAIDYINAHGSGTKQNDRHETAAFKRSLGEHAYRTPVSSIKSMVGHSLG
48 AIGSIEIAASILAIQHDVVPPTANLHTPDPQCDLDYVPLNAREQIVDAVLTVGSGFGGFQSAMVLAQPER
49 NAA
50 ```
51
52
53 Citation
54 ========
55
56 If you use this workflow directly, or a derivative of it, or the associated
57 NCBI BLAST wrappers for Galaxy, in work leading to a scientific publication,
58 please cite:
59
60 Peter J. A. Cock, John M. Chilton, Björn Grüning, James E. Johnson, Nicola Soranzo
61 NCBI BLAST+ integrated into Galaxy
62
63 http://biorxiv.org/content/early/2015/01/21/014043
64 http://dx.doi.org/10.1101/014043
65
66
67 Availability
68 ============
69
70 This workflow is available on the main Galaxy Tool Shed:
71
72 http://toolshed.g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow
73
74 Development is being done on github:
75
76 https://github.com/bgruening/galaxytools/workflows/ncbi_blast_plus/
77
78
79 Dependencies
80 ============
81
82 These dependencies should be resolved automatically via the Galaxy Tool Shed:
83
84 * http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus