Mercurial > repos > bgruening > glimmer_hmm
annotate glimmerHMM/glimmerhmm_gff_to_sequence.py @ 1:4da91bb244dc draft
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 2effed877a778e455c63a76e994a0f2bb8f4dba0
author | rmarenco |
---|---|
date | Thu, 14 Jul 2016 15:11:33 -0400 |
parents | c9699375fcf6 |
children |
rev | line source |
---|---|
0
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
1 #!/usr/bin/env python |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
2 """Convert GlimmerHMM GFF3 gene predictions into protein sequences. |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
3 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
4 This works with the GlimmerHMM GFF3 output format: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
5 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
6 ##gff-version 3 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
7 ##sequence-region Contig5.15 1 47390 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
8 Contig5.15 GlimmerHMM mRNA 323 325 . + . ID=Contig5.15.path1.gene1;Name=Contig5.15.path1.gene1 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
9 Contig5.15 GlimmerHMM CDS 323 325 . + 0 ID=Contig5.15.cds1.1;Parent=Contig5.15.path1.gene1;Name=Contig5.15.path1.gene1;Note=final-exon |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
10 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
11 http://www.cbcb.umd.edu/software/GlimmerHMM/ |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
12 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
13 Modified version of the converter from Brad Chapman: https://github.com/chapmanb/bcbb/blob/master/biopython/glimmergff_to_proteins.py |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
14 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
15 Usage: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
16 glimmer_to_proteins.py <glimmer output> <ref fasta> <output file> <convert to protein ... False|True> |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
17 """ |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
18 import sys |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
19 import os |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
20 import operator |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
21 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
22 from Bio import SeqIO |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
23 from Bio.SeqRecord import SeqRecord |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
24 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
25 from BCBio import GFF |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
26 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
27 def main(glimmer_file, ref_file, out_file, to_protein): |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
28 with open(ref_file) as in_handle: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
29 ref_recs = SeqIO.to_dict(SeqIO.parse(in_handle, "fasta")) |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
30 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
31 base, ext = os.path.splitext(glimmer_file) |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
32 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
33 with open(out_file, "w") as out_handle: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
34 SeqIO.write(protein_recs(glimmer_file, ref_recs, to_protein), out_handle, "fasta") |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
35 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
36 def protein_recs(glimmer_file, ref_recs, to_protein): |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
37 """Generate protein records from GlimmerHMM gene predictions. |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
38 """ |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
39 with open(glimmer_file) as in_handle: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
40 for rec in glimmer_predictions(in_handle, ref_recs): |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
41 for feature in rec.features: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
42 seq_exons = [] |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
43 for cds in feature.sub_features: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
44 seq_exons.append(rec.seq[ |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
45 cds.location.nofuzzy_start: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
46 cds.location.nofuzzy_end]) |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
47 gene_seq = reduce(operator.add, seq_exons) |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
48 if feature.strand == -1: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
49 gene_seq = gene_seq.reverse_complement() |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
50 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
51 if to_protein: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
52 yield SeqRecord(gene_seq.translate(), feature.qualifiers["ID"][0], "", "") |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
53 else: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
54 yield SeqRecord(gene_seq, feature.qualifiers["ID"][0], "", "") |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
55 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
56 def glimmer_predictions(in_handle, ref_recs): |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
57 """Parse Glimmer output, generating SeqRecord and SeqFeatures for predictions |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
58 """ |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
59 for rec in GFF.parse(in_handle, target_lines=1000, base_dict=ref_recs): |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
60 yield rec |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
61 |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
62 if __name__ == "__main__": |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
63 if len(sys.argv) != 3: |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
64 print __doc__ |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
65 sys.exit() |
c9699375fcf6
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff
changeset
|
66 main(*sys.argv[1:]) |