# HG changeset patch # User peterjc # Date 1485950127 18000 # Node ID de803005027ffc8506a52277663a7c5b4111233e # Parent 9fbf29a8c12b128ac2af92ed26e8aec645e76b65 v0.0.7 internal Python style fixes diff -r 9fbf29a8c12b -r de803005027f tools/align_back_trans/README.rst --- a/tools/align_back_trans/README.rst Wed Aug 05 10:52:56 2015 -0400 +++ b/tools/align_back_trans/README.rst Wed Feb 01 06:55:27 2017 -0500 @@ -69,6 +69,7 @@ v0.0.6 - Reorder XML elements (internal change only). - Use ``format_source=...`` tag. - Planemo for Tool Shed upload (``.shed.yml``, internal change only). +v0.0.7 - Minor Python code style improvements (internal change only). ======= ====================================================================== @@ -85,12 +86,12 @@ Planemo commands (which requires you have set your Tool Shed access details in ``~/.planemo.yml`` and that you have access rights on the Tool Shed):: - $ planemo shed_update --shed_target testtoolshed --check_diff ~/repositories/pico_galaxy/tools/align_back_trans/ + $ planemo shed_update -t testtoolshed --check_diff ~/repositories/pico_galaxy/tools/align_back_trans/ ... or:: - $ planemo shed_update --shed_target toolshed --check_diff ~/repositories/pico_galaxy/tools/align_back_trans/ + $ planemo shed_update -t toolshed --check_diff ~/repositories/pico_galaxy/tools/align_back_trans/ ... To just build and check the tar ball, use:: diff -r 9fbf29a8c12b -r de803005027f tools/align_back_trans/align_back_trans.py --- a/tools/align_back_trans/align_back_trans.py Wed Aug 05 10:52:56 2015 -0400 +++ b/tools/align_back_trans/align_back_trans.py Wed Feb 01 06:55:27 2017 -0500 @@ -8,7 +8,7 @@ The development repository for this tool is here: -* https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans +* https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans This tool is available with a Galaxy wrapper from the Galaxy Tool Shed at: @@ -19,34 +19,30 @@ import sys from Bio.Seq import Seq -from Bio.Alphabet import generic_dna, generic_protein +from Bio.Alphabet import generic_protein from Bio.Align import MultipleSeqAlignment from Bio import SeqIO from Bio import AlignIO from Bio.Data.CodonTable import ambiguous_generic_by_id if "-v" in sys.argv or "--version" in sys.argv: - print "v0.0.5" + print "v0.0.7" sys.exit(0) -def sys_exit(msg, error_level=1): - """Print error message to stdout and quit with given error level.""" - sys.stderr.write("%s\n" % msg) - sys.exit(error_level) def check_trans(identifier, nuc, prot, table): """Returns nucleotide sequence if works (can remove trailing stop)""" if len(nuc) % 3: - sys_exit("Nucleotide sequence for %s is length %i (not a multiple of three)" + sys.exit("Nucleotide sequence for %s is length %i (not a multiple of three)" % (identifier, len(nuc))) p = str(prot).upper().replace("*", "X") t = str(nuc.translate(table)).upper().replace("*", "X") if len(t) == len(p) + 1: if str(nuc)[-3:].upper() in ambiguous_generic_by_id[table].stop_codons: - #Allow this... + # Allow this... t = t[:-1] - nuc = nuc[:-3] #edit return value + nuc = nuc[:-3] # edit return value if len(t) != len(p): err = ("Inconsistent lengths for %s, ungapped protein %i, " "tripled %i vs ungapped nucleotide %i." % @@ -56,39 +52,39 @@ elif t.startswith(p): err += "\nThere are %i extra nucleotides at the end." % (len(t) - len(p)) elif p in t: - #TODO - Calculate and report the number to trim at start and end? + # TODO - Calculate and report the number to trim at start and end? err += "\nHowever, protein sequence found within translated nucleotides." elif p[1:] in t: err += "\nHowever, ignoring first amino acid, protein sequence found within translated nucleotides." - sys_exit(err) - + sys.exit(err) if t == p: return nuc elif p.startswith("M") and "M" + t[1:] == p: - #Close, was there a start codon? + # Close, was there a start codon? if str(nuc[0:3]).upper() in ambiguous_generic_by_id[table].start_codons: return nuc else: - sys_exit("Translation check failed for %s\n" + sys.exit("Translation check failed for %s\n" "Would match if %s was a start codon (check correct table used)\n" % (identifier, nuc[0:3].upper())) else: - #Allow * vs X here? e.g. internal stop codons - m = "".join("." if x==y else "!" for (x,y) in zip(p,t)) + # Allow * vs X here? e.g. internal stop codons + m = "".join("." if x == y else "!" for (x, y) in zip(p, t)) if len(prot) < 70: sys.stderr.write("Protein: %s\n" % p) sys.stderr.write(" %s\n" % m) sys.stderr.write("Translation: %s\n" % t) else: for offset in range(0, len(p), 60): - sys.stderr.write("Protein: %s\n" % p[offset:offset+60]) - sys.stderr.write(" %s\n" % m[offset:offset+60]) - sys.stderr.write("Translation: %s\n\n" % t[offset:offset+60]) - sys_exit("Translation check failed for %s\n" % identifier) + sys.stderr.write("Protein: %s\n" % p[offset:offset + 60]) + sys.stderr.write(" %s\n" % m[offset:offset + 60]) + sys.stderr.write("Translation: %s\n\n" % t[offset:offset + 60]) + sys.exit("Translation check failed for %s\n" % identifier) + def sequence_back_translate(aligned_protein_record, unaligned_nucleotide_record, gap, table=0): - #TODO - Separate arguments for protein gap and nucleotide gap? + # TODO - Separate arguments for protein gap and nucleotide gap? if not gap or len(gap) != 1: raise ValueError("Please supply a single gap character") @@ -99,14 +95,14 @@ else: from Bio.Alphabet import Gapped alpha = Gapped(alpha, gap) - gap_codon = gap*3 + gap_codon = gap * 3 ungapped_protein = aligned_protein_record.seq.ungap(gap) ungapped_nucleotide = unaligned_nucleotide_record.seq if table: ungapped_nucleotide = check_trans(aligned_protein_record.id, ungapped_nucleotide, ungapped_protein, table) elif len(ungapped_protein) * 3 != len(ungapped_nucleotide): - sys_exit("Inconsistent lengths for %s, ungapped protein %i, " + sys.exit("Inconsistent lengths for %s, ungapped protein %i, " "tripled %i vs ungapped nucleotide %i" % (aligned_protein_record.id, len(ungapped_protein), @@ -124,15 +120,16 @@ assert not nuc, "Nucleotide sequence for %r longer than protein %r" \ % (unaligned_nucleotide_record.id, aligned_protein_record.id) - aligned_nuc = unaligned_nucleotide_record[:] #copy for most annotation - aligned_nuc.letter_annotation = {} #clear this - aligned_nuc.seq = Seq("".join(seq), alpha) #replace this + aligned_nuc = unaligned_nucleotide_record[:] # copy for most annotation + aligned_nuc.letter_annotation = {} # clear this + aligned_nuc.seq = Seq("".join(seq), alpha) # replace this assert len(aligned_protein_record.seq) * 3 == len(aligned_nuc) return aligned_nuc + def alignment_back_translate(protein_alignment, nucleotide_records, key_function=None, gap=None, table=0): """Thread nucleotide sequences onto a protein alignment.""" - #TODO - Separate arguments for protein and nucleotide gap characters? + # TODO - Separate arguments for protein and nucleotide gap characters? if key_function is None: key_function = lambda x: x if gap is None: @@ -143,7 +140,7 @@ try: nucleotide = nucleotide_records[key_function(protein.id)] except KeyError: - raise ValueError("Could not find nucleotide sequence for protein %r" \ + raise ValueError("Could not find nucleotide sequence for protein %r" % protein.id) aligned.append(sequence_back_translate(protein, nucleotide, gap, table)) return MultipleSeqAlignment(aligned) @@ -159,7 +156,7 @@ elif len(sys.argv) == 6: align_format, prot_align_file, nuc_fasta_file, nuc_align_file, table = sys.argv[1:] else: - sys_exit("""This is a Python script for 'back-translating' a protein alignment, + sys.exit("""This is a Python script for 'back-translating' a protein alignment, It requires three, four or five arguments: - alignment format (e.g. fasta, clustal), @@ -183,8 +180,8 @@ try: table = int(table) -except: - sys_exit("Bad table argument %r" % table) +except ValueError: + sys.exit("Bad table argument %r" % table) prot_align = AlignIO.read(prot_align_file, align_format, alphabet=generic_protein) nuc_dict = SeqIO.index(nuc_fasta_file, "fasta") diff -r 9fbf29a8c12b -r de803005027f tools/align_back_trans/align_back_trans.xml --- a/tools/align_back_trans/align_back_trans.xml Wed Aug 05 10:52:56 2015 -0400 +++ b/tools/align_back_trans/align_back_trans.xml Wed Feb 01 06:55:27 2017 -0500 @@ -1,4 +1,4 @@ - + Gives a codon aware alignment biopython diff -r 9fbf29a8c12b -r de803005027f tools/align_back_trans/tool_dependencies.xml --- a/tools/align_back_trans/tool_dependencies.xml Wed Aug 05 10:52:56 2015 -0400 +++ b/tools/align_back_trans/tool_dependencies.xml Wed Feb 01 06:55:27 2017 -0500 @@ -1,6 +1,6 @@ - +