annotate scripts/S01_find_orf_on_multiple_alignment.py @ 1:c79bdda8abfb draft default tip

planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
author abims-sbr
date Thu, 09 Jun 2022 12:40:00 +0000
parents eb95bf7f90ae
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
1 #!/usr/bin/env python
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
2 # coding: utf8
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
3 # Author: Eric Fontanillas
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
4 # Modification: 03/09/14 by Julie BAFFARD
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
5 # Last modification : 10/09/21 by Charlotte Berthelier
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
6
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
7 """
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
8 Description: Predict potential ORF on the basis of 2 criteria + 1 optional criteria
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
9
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
10 - CRITERIA 1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
11
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
12 Get the longest part of the sequence alignemen without codon stop "*",
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
13 and test in the 3 potential ORF and check with a Blast the best coding sequence
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
14
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
15 - CRITERIA 2
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
16
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
17 This longest part should be > 150nc or 50aa
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
18
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
19 - CRITERIA 3 [OPTIONNAL]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
20
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
21 A codon start "M" should be present in this longuest part, before the last 50 aa
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
22
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
23 OUTPUTs:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
24 "05_CDS_aa" & "05_CDS_nuc" => NOT INCLUDE THIS CRITERIA
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
25 "06_CDS_with_M_aa" & "06_CDS_with_M_nuc" => INCLUDE THIS CRITERIA
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
26 """
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
27
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
28 import os
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
29 import re
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
30 import argparse
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
31
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
32 from Bio.Blast import NCBIWWW
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
33 from Bio.Blast import NCBIXML
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
34 from dico import dico
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
35
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
36
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
37 def code_universel(file1):
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
38 """ Creates bash for genetic code (key : codon ; value : amino-acid) """
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
39 bash_code_universel = {}
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
40
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
41 with open(file1, "r") as file:
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
42 for line in file.readlines():
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
43 item = str.split(line, " ")
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
44 length1 = len(item)
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
45 if length1 == 3:
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
46 key = item[0]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
47 value = item[2][:-1]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
48 bash_code_universel[key] = value
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
49 else:
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
50 key = item[0]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
51 value = item[2]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
52 bash_code_universel[key] = value
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
53
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
54 return bash_code_universel
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
55
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
56
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
57 def multiple3(seq):
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
58 """
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
59 Tests if the sequence is a multiple of 3, and if not removes extra-bases
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
60 Possible to lost a codon, when I test ORF (as I will decay the ORF)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
61 """
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
62
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
63 multiple = len(seq) % 3
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
64 if multiple != 0:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
65 return seq[:-multiple], multiple
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
66 return seq, multiple
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
67
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
68
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
69 def detect_methionine(seq_aa, ortho, minimal_cds_length):
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
70 """ Detects if methionin in the aa sequence """
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
71
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
72 size = len(seq_aa)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
73 cutoff_last_50aa = size - minimal_cds_length
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
74
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
75 # Find all indices of occurances of "M" in a string of aa
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
76 list_indices = [pos for pos, char in enumerate(seq_aa) if char == "M"]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
77
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
78 # If some "M" are present, find whether the first "M" found is not in the
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
79 # 50 last aa (indice < CUTOFF_Last_50aa) ==> in this case: maybenot a CDS
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
80 if list_indices != []:
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
81 first_m = list_indices[0]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
82 if first_m < cutoff_last_50aa:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
83 ortho = 1 # means orthologs found
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
84
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
85 return ortho
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
86
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
87
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
88 def reverse_complement2(seq):
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
89 """ Reverse complement DNA sequence """
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
90 seq1 = 'ATCGN-TAGCN-atcgn-tagcn-'
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
91 seq_dict = {seq1[i]: seq1[i + 6]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
92 for i in range(24) if i < 6 or 12 <= i <= 16}
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
93
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
94 return "".join([seq_dict[base] for base in reversed(seq)])
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
95
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
96
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
97 def simply_get_orf(seq_dna, gen_code):
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
98 """ Generate the ORF sequence from DNA sequence """
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
99 seq_by_codons = [seq_dna.upper().replace('T', 'U')[i:i + 3]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
100 for i in range(0, len(seq_dna), 3)]
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
101 seq_by_aa = [gen_code[codon] if codon in gen_code.keys()
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
102 else '?' for codon in seq_by_codons]
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
103
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
104 return ''.join(seq_by_aa)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
105
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
106 def find_good_ORF_criteria_3(bash_aligned_nc_seq, bash_codeUniversel, minimal_cds_length, min_spec):
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
107 # Multiple sequence based : Based on the alignment of several sequences (orthogroup)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
108 # Criteria 1 : Get the segment in the alignment with no codon stop
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
109
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
110 # 1 - Get the list of aligned aa seq for the 3 ORF:
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
111 print("1 - Get the list of aligned aa seq for the 3 ORF")
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
112 bash_of_aligned_aa_seq_3ORF = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
113 bash_of_aligned_nuc_seq_3ORF = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
114 BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION = []
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
115
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
116 for fasta_name in bash_aligned_nc_seq.keys():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
117 # Get sequence, chek if multiple 3, then get 6 orfs
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
118 sequence_nc = bash_aligned_nc_seq[fasta_name]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
119 new_sequence_nc, modulo = multiple3(sequence_nc)
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
120 new_sequence_rev = reverse_complement2(new_sequence_nc)
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
121 # For each seq of the multialignment => give the 6 ORFs (in nuc)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
122 bash_of_aligned_nuc_seq_3ORF[fasta_name] = [new_sequence_nc, new_sequence_nc[1:-2], new_sequence_nc[2:-1], new_sequence_rev, new_sequence_rev[1:-2], new_sequence_rev[2:-1]]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
123
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
124 seq_prot_ORF1 = simply_get_orf(new_sequence_nc, bash_codeUniversel)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
125 seq_prot_ORF2 = simply_get_orf(new_sequence_nc[1:-2], bash_codeUniversel)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
126 seq_prot_ORF3 = simply_get_orf(new_sequence_nc[2:-1], bash_codeUniversel)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
127 seq_prot_ORF4 = simply_get_orf(new_sequence_rev, bash_codeUniversel)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
128 seq_prot_ORF5 = simply_get_orf(new_sequence_rev[1:-2], bash_codeUniversel)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
129 seq_prot_ORF6 = simply_get_orf(new_sequence_rev[2:-1], bash_codeUniversel)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
130
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
131 # For each seq of the multialignment => give the 6 ORFs (in aa)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
132 bash_of_aligned_aa_seq_3ORF[fasta_name] = [seq_prot_ORF1, seq_prot_ORF2, seq_prot_ORF3, seq_prot_ORF4, seq_prot_ORF5, seq_prot_ORF6]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
133
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
134 # 2 - Test for the best ORF (Get the longuest segment in the alignment with no codon stop ... for each ORF ... the longuest should give the ORF)
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
135 print("2 - Test for the best ORF")
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
136 BEST_MAX = 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
137
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
138 for i in [0,1,2,3,4,5]: # Test the 6 ORFs
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
139 ORF_Aligned_aa = []
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
140 ORF_Aligned_nuc = []
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
141
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
142 # 2.1 - Get the alignment of sequence for a given ORF
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
143 # Compare the 1rst ORF between all sequence => list them in ORF_Aligned_aa // them do the same for the second ORF, and them the 3rd
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
144 for fasta_name in bash_of_aligned_aa_seq_3ORF.keys():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
145 ORFsequence = bash_of_aligned_aa_seq_3ORF[fasta_name][i]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
146 aa_length = len(ORFsequence)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
147 ORF_Aligned_aa.append(ORFsequence) ### List of all sequences in the ORF nb "i" =
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
148
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
149 n = i+1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
150
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
151 for fasta_name in bash_of_aligned_nuc_seq_3ORF.keys():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
152 ORFsequence = bash_of_aligned_nuc_seq_3ORF[fasta_name][i]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
153 nuc_length = len(ORFsequence)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
154 ORF_Aligned_nuc.append(ORFsequence) # List of all sequences in the ORF nb "i" =
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
155
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
156 # 2.2 - Get the list of sublist of positions whithout codon stop in the alignment
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
157 # For each ORF, now we have the list of sequences available (i.e. THE ALIGNMENT IN A GIVEN ORF)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
158 # Next step is to get the longuest subsequence whithout stop
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
159 # We will explore the presence of stop "*" in each column of the alignment, and get the positions of the segments between the positions with "*"
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
160 MAX_LENGTH = 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
161 LONGUEST_SEGMENT_UNSTOPPED = ""
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
162 j = 0 # Start from first position in alignment
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
163 List_of_List_subsequences = []
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
164 List_positions_subsequence = []
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
165 while j < aa_length:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
166 column = []
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
167 for seq in ORF_Aligned_aa:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
168 column.append(seq[j])
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
169 j = j+1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
170 if "*" in column:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
171 List_of_List_subsequences.append(List_positions_subsequence) # Add previous list of positions
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
172 List_positions_subsequence = [] # Re-initialyse list of positions
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
173 else:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
174 List_positions_subsequence.append(j)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
175
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
176 # 2.3 - Among all the sublists (separated by column with codon stop "*"), get the longuest one (BETTER SEGMENT for a given ORF)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
177 LONGUEST_SUBSEQUENCE_LIST_POSITION = []
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
178 MAX=0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
179 for sublist in List_of_List_subsequences:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
180 if len(sublist) > MAX and len(sublist) > minimal_cds_length:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
181 MAX = len(sublist)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
182 LONGUEST_SUBSEQUENCE_LIST_POSITION = sublist
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
183
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
184 # 2.4. - Test if the longuest subsequence start exactly at the beginning of the original sequence (i.e. means the ORF maybe truncated)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
185 if LONGUEST_SUBSEQUENCE_LIST_POSITION != []:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
186 if LONGUEST_SUBSEQUENCE_LIST_POSITION[0] == 0:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
187 CDS_maybe_truncated = 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
188 else:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
189 CDS_maybe_truncated = 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
190 else:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
191 CDS_maybe_truncated = 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
192
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
193
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
194 # 2.5 - Test if this BETTER SEGMENT for a given ORF, is the better than the one for the other ORF (GET THE BEST ORF)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
195 # Test whether it is the better ORF
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
196 if MAX > BEST_MAX:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
197 BEST_MAX = MAX
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
198 BEST_ORF = i+1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
199 BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION = LONGUEST_SUBSEQUENCE_LIST_POSITION
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
200
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
201
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
202 # 3 - ONCE we have this better segment (BEST CODING SEGMENT)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
203 # ==> GET THE STARTING and ENDING POSITIONS (in aa position and in nuc position)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
204 # And get the INDEX of the best ORF [0, 1, or 2]
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
205 print("3 - ONCE we have this better segment")
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
206 if BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION != []:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
207 pos_MIN_aa = BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION[0]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
208 pos_MIN_aa = pos_MIN_aa - 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
209 pos_MAX_aa = BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION[-1]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
210
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
211
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
212 BESTORF_bash_of_aligned_aa_seq = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
213 BESTORF_bash_of_aligned_aa_seq_CODING = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
214 for fasta_name in bash_of_aligned_aa_seq_3ORF.keys():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
215 index_BEST_ORF = BEST_ORF-1 # cause list going from 0 to 2 in LIST_3_ORF, while the ORF nb is indexed from 1 to 3
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
216 seq = bash_of_aligned_aa_seq_3ORF[fasta_name][index_BEST_ORF]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
217 seq_coding = seq[pos_MIN_aa:pos_MAX_aa]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
218 BESTORF_bash_of_aligned_aa_seq[fasta_name] = seq
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
219 BESTORF_bash_of_aligned_aa_seq_CODING[fasta_name] = seq_coding
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
220
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
221 # 4 - Get the corresponding position (START/END of BEST CODING SEGMENT) for nucleotides alignment
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
222 print("4 - Get the corresponding position")
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
223 pos_MIN_nuc = pos_MIN_aa * 3
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
224 pos_MAX_nuc = pos_MAX_aa * 3
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
225
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
226 BESTORF_bash_aligned_nc_seq = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
227 BESTORF_bash_aligned_nc_seq_CODING = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
228 for fasta_name in bash_aligned_nc_seq.keys():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
229 seq = bash_of_aligned_nuc_seq_3ORF[fasta_name][index_BEST_ORF]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
230 seq_coding = seq[pos_MIN_nuc:pos_MAX_nuc]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
231 BESTORF_bash_aligned_nc_seq[fasta_name] = seq
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
232 BESTORF_bash_aligned_nc_seq_CODING[fasta_name] = seq_coding
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
233 seq_cutted = re.sub(r'^.*?[a-zA-Z]', '', seq)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
234 sequence_for_blast=(fasta_name+'\n'+seq_cutted+'\n')
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
235 good_ORF_found = False
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
236 try:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
237 #result_handle = ""
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
238 #blast_records = ""
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
239 # logger.debug("sequence_for_blast = %s ", sequence_for_blast)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
240 print('sequence_for_blast = %s ',sequence_for_blast, end=' ', flush=True)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
241 result_handle = NCBIWWW.qblast("blastn", "/db/nt/current/fasta/nt.fsa", sequence_for_blast, expect=0.001, hitlist_size=1)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
242 blast_records = NCBIXML.parse(result_handle)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
243 except:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
244 good_ORF_found = False
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
245 else:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
246 for blast_record in blast_records:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
247 for alignment in blast_record.alignments:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
248 for hsp in alignment.hsps:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
249 if hsp.expect < 0.001:
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
250 good_ORF_found = True
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
251 print("good_ORF_found = %s" %good_ORF_found)
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
252
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
253 else: # no CDS found
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
254 BESTORF_bash_aligned_nc_seq = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
255 BESTORF_bash_aligned_nc_seq_CODING = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
256 BESTORF_bash_of_aligned_aa_seq = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
257 BESTORF_bash_of_aligned_aa_seq_CODING ={}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
258
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
259 # Check whether their is a "M" or not, and if at least 1 "M" is present, that it is not in the last 50 aa
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
260
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
261 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
262 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = {}
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
263
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
264 Ortho = 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
265 for fasta_name in BESTORF_bash_of_aligned_aa_seq_CODING.keys():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
266 seq_aa = BESTORF_bash_of_aligned_aa_seq_CODING[fasta_name]
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
267 Ortho = detect_methionine(seq_aa, Ortho, minimal_cds_length) ### DEF6 ###
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
268
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
269 # CASE 1: A "M" is present and correctly localized (not in last 50 aa)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
270 if Ortho == 1:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
271 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = BESTORF_bash_of_aligned_aa_seq_CODING
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
272 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = BESTORF_bash_aligned_nc_seq_CODING
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
273
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
274 # CASE 2: in case the CDS is truncated, so the "M" is maybe missing:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
275 if Ortho == 0 and CDS_maybe_truncated == 1:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
276 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = BESTORF_bash_of_aligned_aa_seq_CODING
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
277 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = BESTORF_bash_aligned_nc_seq_CODING
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
278
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
279 # CASE 3: CDS not truncated AND no "M" found in good position (i.e. before the last 50 aa):
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
280 ## => the 2 bash "CDS_with_M" are left empty ("{}")
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
281
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
282 return(BESTORF_bash_aligned_nc_seq, BESTORF_bash_aligned_nc_seq_CODING, BESTORF_bash_of_aligned_nuc_seq_CDS_with_M, BESTORF_bash_of_aligned_aa_seq, BESTORF_bash_of_aligned_aa_seq_CODING, BESTORF_bash_of_aligned_aa_seq_CDS_with_M)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
283
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
284 def write_output_file(results_dict, name_elems, path_out):
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
285 if results_dict != {}:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
286 name_elems[3] = str(len(results_dict.keys()))
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
287 new_name = "_".join(name_elems)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
288
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
289 out1 = open("%s/%s" %(path_out,new_name), "w")
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
290 for fasta_name in results_dict.keys():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
291 seq = results_dict[fasta_name]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
292 out1.write("%s\n" %fasta_name)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
293 out1.write("%s\n" %seq)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
294 out1.close()
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
295
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
296 def main():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
297 parser = argparse.ArgumentParser()
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
298 parser.add_argument("codeUniversel", help="File describing the genetic code (code_universel_modified.txt")
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
299 parser.add_argument("min_cds_len", help="Minmal length of a CDS (in amino-acids)", type=int)
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
300 parser.add_argument("min_spec", help="Minimal number of species per alignment")
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
301 parser.add_argument("list_files", help="File with all input files names")
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
302 args = parser.parse_args()
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
303
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
304 minimal_cds_length = int(args.min_cds_len) # in aa number
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
305 bash_codeUniversel = code_universel(args.codeUniversel)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
306 minimum_species = int(args.min_spec)
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
307
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
308 # Inputs from file containing list of species
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
309 list_files = []
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
310 with open(args.list_files, 'r') as f:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
311 for line in f.readlines():
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
312 list_files.append(line.strip('\n'))
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
313
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
314 # Directories for results
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
315 dirs = ["04_BEST_ORF_nuc", "04_BEST_ORF_aa", "05_CDS_nuc", "05_CDS_aa", "06_CDS_with_M_nuc", "06_CDS_with_M_aa"]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
316 for directory in dirs:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
317 os.mkdir(directory)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
318
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
319 count_file_processed, count_file_with_CDS, count_file_without_CDS, count_file_with_CDS_plus_M = 0, 0, 0, 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
320 count_file_with_cds_and_enought_species, count_file_with_cds_M_and_enought_species = 0, 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
321
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
322 # ! : Currently, files are named "Orthogroup_x_y_sequences.fasta, where x is the number of the orthogroup (not important, juste here to make a distinct name),
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
323 # and y is the number of sequences/species in the group. These files are outputs of blastalign, where species can be removed. y is then modified.
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
324 name_elems = ["orthogroup", "0", "with", "0", "species.fasta"]
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
325
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
326 # by fixing the counter here, there will be some "holes" in the outputs directories (missing numbers), but the groups between directories will correspond
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
327 #n0 = 0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
328
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
329 for file in list_files:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
330 #n0 += 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
331
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
332 count_file_processed = count_file_processed + 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
333 nb_gp = file.split('_')[1] # Keep trace of the orthogroup number
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
334 fasta_file_path = "./%s" %file
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
335 bash_fasta = dico(fasta_file_path)
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
336 BESTORF_nuc, BESTORF_nuc_CODING, BESTORF_nuc_CDS_with_M, BESTORF_aa, BESTORF_aa_CODING, BESTORF_aa_CDS_with_M = find_good_ORF_criteria_3(bash_fasta, bash_codeUniversel, minimal_cds_length, minimum_species)
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
337
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
338 name_elems[1] = nb_gp
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
339
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
340 # Update counts and write group in corresponding output directory
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
341 if BESTORF_nuc != {}:
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
342 count_file_with_CDS += 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
343 if len(BESTORF_nuc.keys()) >= minimum_species :
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
344 count_file_with_cds_and_enought_species += 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
345 write_output_file(BESTORF_nuc, name_elems, dirs[0]) # OUTPUT BESTORF_nuc
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
346 write_output_file(BESTORF_aa, name_elems, dirs[1]) # The most interesting
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
347 else:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
348 count_file_without_CDS += 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
349
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
350 if BESTORF_nuc_CODING != {} and len(BESTORF_nuc_CODING.keys()) >= minimum_species:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
351 write_output_file(BESTORF_nuc_CODING, name_elems, dirs[2])
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
352 write_output_file(BESTORF_aa_CODING, name_elems, dirs[3])
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
353
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
354 if BESTORF_nuc_CDS_with_M != {}:
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
355 count_file_with_CDS_plus_M += 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
356 if len(BESTORF_nuc_CDS_with_M.keys()) >= minimum_species :
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
357 count_file_with_cds_M_and_enought_species += 1
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
358 write_output_file(BESTORF_nuc_CDS_with_M, name_elems, dirs[4])
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
359 write_output_file(BESTORF_aa_CDS_with_M, name_elems, dirs[5])
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
360
1
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
361 print("*************** CDS detection ***************")
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
362 print("\nFiles processed: %d" %count_file_processed)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
363 print("\tFiles with CDS: %d" %count_file_with_CDS)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
364 print("\tFiles wth CDS and more than %s species: %d" %(minimum_species, count_file_with_cds_and_enought_species))
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
365 print("\t\tFiles with CDS plus M (codon start): %d" %count_file_with_CDS_plus_M)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
366 print("\t\tFiles with CDS plus M (codon start) and more than %s species: %d" %(minimum_species,count_file_with_cds_M_and_enought_species) )
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
367 print("\tFiles without CDS: %d \n" %count_file_without_CDS)
c79bdda8abfb planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents: 0
diff changeset
368 print("")
0
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
369
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
370 if __name__ == '__main__':
eb95bf7f90ae planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff changeset
371 main()