pcr_markers: design_primers.py annotate

author	john-mccallum
date	Thu, 14 Jun 2012 19:29:26 -0400
parents
children	b321e0517be3

rev	line source
0 21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	1 #!/usr/local/bin/python2.6
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	2 ##design primers to features in multiple sequences
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	3 ##usage: python design_primers.py <fasta-file> <gff file> <file of target IDs> <prod_min_size> <prod_max_size>
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	4
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	5
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	6 #Copyright 2012 John McCallum & Leshi Chen
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	7 #New Zealand Institute for Plant and Food Research
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	8 #This program is free software: you can redistribute it and/or modify
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	9 # it under the terms of the GNU General Public License as published by
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	10 # the Free Software Foundation, either version 3 of the License, or
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	11 # (at your option) any later version.
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	12 #
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	13 # This program is distributed in the hope that it will be useful,
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	14 # but WITHOUT ANY WARRANTY; without even the implied warranty of
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	15 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	16 # GNU General Public License for more details.
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	17 #
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	18 # You should have received a copy of the GNU General Public License
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	19 # along with this program. If not, see <http://www.gnu.org/licenses/>.
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	20
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	21
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	22 import os
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	23 import StringIO
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	24 import re
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	25 import tempfile
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	26 import subprocess
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	27 import copy
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	28 import sys
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	29 from BCBio import GFF
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	30 from BCBio.GFF import GFFExaminer
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	31 from Bio import SeqIO
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	32 from Bio.Emboss.Applications import Primer3Commandline
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	33 from Bio.Emboss import Primer3
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	34
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	35
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	36 in_file = sys.argv[1]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	37 gff_file = sys.argv[2]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	38 target_file = sys.argv[3]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	39 prod_min_size = int(sys.argv[4])
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	40 prod_max_size = int(sys.argv[5])
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	41
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	42 max_tm_diff=1 ##
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	43 opt_GC_percent=50 ##
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	44 maxpolyx=4 ##
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	45 optimum_length=20
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	46 ##target is specified in start, end format
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	47 productsizerange = str(prod_min_size) + "," + str(prod_max_size)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	48 #open input files
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	49 in_seq_handle = open(in_file)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	50 in_gff_handle = open(gff_file)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	51 in_target_handle=open(target_file)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	52 #read target feature IDs into list
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	53 targets=in_target_handle.readlines()
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	54 in_target_handle.close()
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	55 ##and create a hit list of sequences from this
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	56 target_seq_id_list = list(set([line.split(":")[0] for line in targets]))
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	57 ##create iterator returning sequence records
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	58 for myrec in SeqIO.parse(in_seq_handle, "fasta"):
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	59 #check if this sequence is included in the target list
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	60 if myrec.id in target_seq_id_list:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	61 ##create sequence dictionary so we can add in gff annotations
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	62 seq_dict = {myrec.id : myrec}
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	63 ##just limit to gff annotations for this sequence
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	64 limit_info = dict(gff_id = [ myrec.id ])
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	65 ##rewind gff filehandle
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	66 in_gff_handle.seek(0)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	67 ##read annotations into sequence dictionary for this sequence record only
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	68 annotations = [r for r in GFF.parse(in_gff_handle, base_dict=seq_dict,limit_info=limit_info)]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	69 ##if there are any annotations, then proceed.
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	70 if annotations:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	71 rec=annotations[0]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	72 ##iterate over list of target IDs
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	73 for t in targets:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	74 target_ID = t.strip('\n')
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	75 target_annotations = [f for f in rec.features if f.id == target_ID ]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	76 if target_annotations:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	77 mytarget = target_annotations[0]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	78 #create temporary files
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	79 tempfastaFile = tempfile.mktemp()
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	80 tempproutfile = tempfile.mktemp()
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	81 #just consider slice of sequence in a window of +/- prod_max_size bp
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	82 ##from feature UNLESS feature is close to end
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	83 ##Note that slice is zero-based
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	84 featLocation = mytarget.location.start.position
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	85 if featLocation > prod_max_size:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	86 slice_start = featLocation - prod_max_size
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	87 featPosition = prod_max_size
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	88 else:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	89 slice_start = 0
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	90 featPosition = featLocation
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	91 if (len(rec) - featLocation) < prod_max_size:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	92 slice_end = len(rec)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	93 else:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	94 slice_end = featLocation + prod_max_size
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	95 ###grab slice of sequence fom this window.
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	96 targetRec = rec[slice_start:slice_end]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	97 matching_feature = [f for f in targetRec.features if f.id == mytarget.id]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	98 if matching_feature:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	99 target_feat = matching_feature[0]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	100 if target_feat.location.start.position == 0:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	101 target_feat.location.start.position = 1
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	102 #we get the mask features by removing the target...all features are masked as just using snp and indels
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	103 ##a smarter filter could be added
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	104 ##note use of list copy to avoid possible side-effects
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	105 exclude_feat = list(targetRec.features)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	106 exclude_feat.remove(target_feat)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	107 ##print'targetRec.features', targetRec.features ##for debug
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	108 mask_str=map(lambda f: str(f.location.start.position+1) + "," + str(f.location.end.position + 1) ,exclude_feat)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	109 #mask_str=map(lambda f: str(f.location).strip('[]'),exclude_feat)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	110 p3_exclude_str = str(mask_str).replace('\', \'',':')
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	111 p3_target = str(target_feat.location.start.position +1) + "," + str(target_feat.location.end.position +1)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	112 #write sequence record into template file as fasta
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	113 t_output_handle = open(tempfastaFile, "w")
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	114 SeqIO.write([targetRec], t_output_handle, "fasta")
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	115 t_output_handle.close()
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	116 #create Primer3Commandline() for emboss tool
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	117 primer_cl = Primer3Commandline()
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	118 #set the emboss tool to suppress output as this will make Galaxy think it is error message although it is a message to state run success
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	119 primer_cl.set_parameter("-auto",'1')
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	120 #pass sequence file to emboss
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	121 primer_cl.set_parameter("-sequence",tempfastaFile)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	122 #add target location
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	123 primer_cl.set_parameter("-target", p3_target)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	124 ##mask off other features...dumb masking of everything at present, beware
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	125 if (p3_exclude_str != ""):
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	126 primer_cl.set_parameter("-excludedregion", p3_exclude_str)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	127 #add temporary output file to get the result
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	128 primer_cl.set_parameter("-outfile", tempproutfile)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	129 #specify maximum different of tm
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	130 primer_cl.set_parameter("-maxdifftm",max_tm_diff )
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	131 #other useful parameters
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	132 primer_cl.set_parameter("-ogcpercent", opt_GC_percent)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	133 primer_cl.set_parameter("-opolyxmax", maxpolyx)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	134 primer_cl.set_parameter("-osize", optimum_length)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	135 #set product size range
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	136 primer_cl.set_parameter("-prange", productsizerange)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	137 #using python subprocess method to run emboss command line programe with the parameters given
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	138 fnull = open(os.devnull, 'w')
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	139 result=subprocess.check_call(str(primer_cl),shell=True ,stdout = fnull, stderr = fnull)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	140 #read temporary outputfile
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	141 handle = open(tempproutfile)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	142 record = Primer3.read(handle)
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	143 ##just return first set, if there is one
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	144 if len(record.primers) > 0:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	145 primer= record.primers[0]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	146 outputstr=[mytarget.id, primer.forward_seq,primer.reverse_seq,primer.size]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	147 else:
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	148 outputstr=[mytarget.id,"NONE","NONE","NONE"]
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	149 print('\t'.join(map(str,outputstr)))
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	150
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	151
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	152 in_gff_handle.close()
21053f7f9ed1 First upload of PCR Marker tools john-mccallum parents: diff changeset	153 in_seq_handle.close()

0

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

1 #!/usr/local/bin/python2.6

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

2 ##design primers to features in multiple sequences

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

3 ##usage: python design_primers.py <fasta-file> <gff file> <file of target IDs> <prod_min_size> <prod_max_size>

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

4

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

5

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

7 #New Zealand Institute for Plant and Food Research

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

8 #This program is free software: you can redistribute it and/or modify

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

9 # it under the terms of the GNU General Public License as published by

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

10 # the Free Software Foundation, either version 3 of the License, or

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

11 # (at your option) any later version.

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

12 #

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

13 # This program is distributed in the hope that it will be useful,

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

14 # but WITHOUT ANY WARRANTY; without even the implied warranty of

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

15 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

16 # GNU General Public License for more details.

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

17 #

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

18 # You should have received a copy of the GNU General Public License

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

19 # along with this program. If not, see <http://www.gnu.org/licenses/>.

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

20

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

21

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

22 import os

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

23 import StringIO

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

24 import re

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

25 import tempfile

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

26 import subprocess

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

27 import copy

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

28 import sys

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

29 from BCBio import GFF

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

30 from BCBio.GFF import GFFExaminer

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

31 from Bio import SeqIO

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

32 from Bio.Emboss.Applications import Primer3Commandline

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

33 from Bio.Emboss import Primer3

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

34

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

35

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

36 in_file = sys.argv[1]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

37 gff_file = sys.argv[2]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

38 target_file = sys.argv[3]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

39 prod_min_size = int(sys.argv[4])

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

40 prod_max_size = int(sys.argv[5])

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

41

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

42 max_tm_diff=1 ##

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

43 opt_GC_percent=50 ##

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

44 maxpolyx=4 ##

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

45 optimum_length=20

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

46 ##target is specified in start, end format

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

47 productsizerange = str(prod_min_size) + "," + str(prod_max_size)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

48 #open input files

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

49 in_seq_handle = open(in_file)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

50 in_gff_handle = open(gff_file)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

51 in_target_handle=open(target_file)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

52 #read target feature IDs into list

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

53 targets=in_target_handle.readlines()

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

54 in_target_handle.close()

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

55 ##and create a hit list of sequences from this

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

56 target_seq_id_list = list(set([line.split(":")[0] for line in targets]))

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

57 ##create iterator returning sequence records

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

58 for myrec in SeqIO.parse(in_seq_handle, "fasta"):

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

59 #check if this sequence is included in the target list

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

60 if myrec.id in target_seq_id_list:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

61 ##create sequence dictionary so we can add in gff annotations

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

62 seq_dict = {myrec.id : myrec}

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

63 ##just limit to gff annotations for this sequence

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

64 limit_info = dict(gff_id = [ myrec.id ])

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

65 ##rewind gff filehandle

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

66 in_gff_handle.seek(0)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

67 ##read annotations into sequence dictionary for this sequence record only

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

68 annotations = [r for r in GFF.parse(in_gff_handle, base_dict=seq_dict,limit_info=limit_info)]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

69 ##if there are any annotations, then proceed.

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

70 if annotations:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

71 rec=annotations[0]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

72 ##iterate over list of target IDs

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

73 for t in targets:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

74 target_ID = t.strip('\n')

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

75 target_annotations = [f for f in rec.features if f.id == target_ID ]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

76 if target_annotations:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

77 mytarget = target_annotations[0]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

78 #create temporary files

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

79 tempfastaFile = tempfile.mktemp()

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

80 tempproutfile = tempfile.mktemp()

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

81 #just consider slice of sequence in a window of +/- prod_max_size bp

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

82 ##from feature UNLESS feature is close to end

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

83 ##Note that slice is zero-based

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

84 featLocation = mytarget.location.start.position

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

85 if featLocation > prod_max_size:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

86 slice_start = featLocation - prod_max_size

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

87 featPosition = prod_max_size

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

88 else:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

89 slice_start = 0

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

90 featPosition = featLocation

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

91 if (len(rec) - featLocation) < prod_max_size:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

92 slice_end = len(rec)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

93 else:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

94 slice_end = featLocation + prod_max_size

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

95 ###grab slice of sequence fom this window.

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

96 targetRec = rec[slice_start:slice_end]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

97 matching_feature = [f for f in targetRec.features if f.id == mytarget.id]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

98 if matching_feature:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

99 target_feat = matching_feature[0]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

100 if target_feat.location.start.position == 0:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

101 target_feat.location.start.position = 1

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

102 #we get the mask features by removing the target...all features are masked as just using snp and indels

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

103 ##a smarter filter could be added

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

104 ##note use of list copy to avoid possible side-effects

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

105 exclude_feat = list(targetRec.features)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

106 exclude_feat.remove(target_feat)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

107 ##print'targetRec.features', targetRec.features ##for debug

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

108 mask_str=map(lambda f: str(f.location.start.position+1) + "," + str(f.location.end.position + 1) ,exclude_feat)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

109 #mask_str=map(lambda f: str(f.location).strip('[]'),exclude_feat)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

110 p3_exclude_str = str(mask_str).replace('\', \'',':')

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

111 p3_target = str(target_feat.location.start.position +1) + "," + str(target_feat.location.end.position +1)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

112 #write sequence record into template file as fasta

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

113 t_output_handle = open(tempfastaFile, "w")

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

114 SeqIO.write([targetRec], t_output_handle, "fasta")

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

115 t_output_handle.close()

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

116 #create Primer3Commandline() for emboss tool

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

117 primer_cl = Primer3Commandline()

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

118 #set the emboss tool to suppress output as this will make Galaxy think it is error message although it is a message to state run success

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

119 primer_cl.set_parameter("-auto",'1')

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

120 #pass sequence file to emboss

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

121 primer_cl.set_parameter("-sequence",tempfastaFile)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

122 #add target location

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

123 primer_cl.set_parameter("-target", p3_target)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

124 ##mask off other features...dumb masking of everything at present, beware

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

125 if (p3_exclude_str != ""):

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

126 primer_cl.set_parameter("-excludedregion", p3_exclude_str)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

127 #add temporary output file to get the result

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

128 primer_cl.set_parameter("-outfile", tempproutfile)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

129 #specify maximum different of tm

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

130 primer_cl.set_parameter("-maxdifftm",max_tm_diff )

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

131 #other useful parameters

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

132 primer_cl.set_parameter("-ogcpercent", opt_GC_percent)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

133 primer_cl.set_parameter("-opolyxmax", maxpolyx)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

134 primer_cl.set_parameter("-osize", optimum_length)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

135 #set product size range

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

136 primer_cl.set_parameter("-prange", productsizerange)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

137 #using python subprocess method to run emboss command line programe with the parameters given

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

138 fnull = open(os.devnull, 'w')

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

139 result=subprocess.check_call(str(primer_cl),shell=True ,stdout = fnull, stderr = fnull)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

140 #read temporary outputfile

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

141 handle = open(tempproutfile)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

142 record = Primer3.read(handle)

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

143 ##just return first set, if there is one

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

144 if len(record.primers) > 0:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

145 primer= record.primers[0]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

146 outputstr=[mytarget.id, primer.forward_seq,primer.reverse_seq,primer.size]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

147 else:

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

148 outputstr=[mytarget.id,"NONE","NONE","NONE"]

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

149 print('\t'.join(map(str,outputstr)))

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

150

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

151

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

152 in_gff_handle.close()

21053f7f9ed1 First upload of PCR Marker tools

john-mccallum

parents:

diff changeset

153 in_seq_handle.close()

Mercurial > repos > john-mccallum > pcr_markers

annotate design_primers.py @ 0:21053f7f9ed1 draft