annotate glimmerHMM/glimmerhmm_gff_to_sequence.py @ 1:4da91bb244dc draft

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 2effed877a778e455c63a76e994a0f2bb8f4dba0
author rmarenco
date Thu, 14 Jul 2016 15:11:33 -0400
parents c9699375fcf6
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
1 #!/usr/bin/env python
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
2 """Convert GlimmerHMM GFF3 gene predictions into protein sequences.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
3
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
4 This works with the GlimmerHMM GFF3 output format:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
5
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
6 ##gff-version 3
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
7 ##sequence-region Contig5.15 1 47390
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
8 Contig5.15 GlimmerHMM mRNA 323 325 . + . ID=Contig5.15.path1.gene1;Name=Contig5.15.path1.gene1
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
9 Contig5.15 GlimmerHMM CDS 323 325 . + 0 ID=Contig5.15.cds1.1;Parent=Contig5.15.path1.gene1;Name=Contig5.15.path1.gene1;Note=final-exon
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
10
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
11 http://www.cbcb.umd.edu/software/GlimmerHMM/
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
12
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
13 Modified version of the converter from Brad Chapman: https://github.com/chapmanb/bcbb/blob/master/biopython/glimmergff_to_proteins.py
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
14
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
15 Usage:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
16 glimmer_to_proteins.py <glimmer output> <ref fasta> <output file> <convert to protein ... False|True>
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
17 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
18 import sys
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
19 import os
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
20 import operator
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
21
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
22 from Bio import SeqIO
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
23 from Bio.SeqRecord import SeqRecord
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
24
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
25 from BCBio import GFF
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
26
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
27 def main(glimmer_file, ref_file, out_file, to_protein):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
28 with open(ref_file) as in_handle:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
29 ref_recs = SeqIO.to_dict(SeqIO.parse(in_handle, "fasta"))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
30
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
31 base, ext = os.path.splitext(glimmer_file)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
32
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
33 with open(out_file, "w") as out_handle:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
34 SeqIO.write(protein_recs(glimmer_file, ref_recs, to_protein), out_handle, "fasta")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
35
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
36 def protein_recs(glimmer_file, ref_recs, to_protein):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
37 """Generate protein records from GlimmerHMM gene predictions.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
38 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
39 with open(glimmer_file) as in_handle:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
40 for rec in glimmer_predictions(in_handle, ref_recs):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
41 for feature in rec.features:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
42 seq_exons = []
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
43 for cds in feature.sub_features:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
44 seq_exons.append(rec.seq[
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
45 cds.location.nofuzzy_start:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
46 cds.location.nofuzzy_end])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
47 gene_seq = reduce(operator.add, seq_exons)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
48 if feature.strand == -1:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
49 gene_seq = gene_seq.reverse_complement()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
50
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
51 if to_protein:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
52 yield SeqRecord(gene_seq.translate(), feature.qualifiers["ID"][0], "", "")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
53 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
54 yield SeqRecord(gene_seq, feature.qualifiers["ID"][0], "", "")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
55
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
56 def glimmer_predictions(in_handle, ref_recs):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
57 """Parse Glimmer output, generating SeqRecord and SeqFeatures for predictions
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
58 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
59 for rec in GFF.parse(in_handle, target_lines=1000, base_dict=ref_recs):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
60 yield rec
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
61
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
62 if __name__ == "__main__":
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
63 if len(sys.argv) != 3:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
64 print __doc__
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
65 sys.exit()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
66 main(*sys.argv[1:])