annotate pal_filter.py @ 9:52dbe2089d14 draft default tip

Version 0.02.04.8 (update fastq subsetting).
author pjbriggs
date Wed, 04 Jul 2018 06:05:52 -0400
parents 8159dab5dbdb
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
1 #!/usr/bin/python -tt
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
2 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
3 pal_filter
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
4 https://github.com/graemefox/pal_filter
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
5
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
6 Graeme Fox - 03/03/2016 - graeme.fox@manchester.ac.uk
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
7 Tested on 64-bit Ubuntu, with Python 2.7
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
8
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
9 ~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
10 PROGRAM DESCRIPTION
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
11
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
12 Program to pick optimum loci from the output of pal_finder_v0.02.04
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
13
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
14 This program can be used to filter output from pal_finder and choose the
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
15 'optimum' loci.
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
16
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
17 For the paper referncing this workflow, see Griffiths et al.
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
18 (unpublished as of 15/02/2016) (sarah.griffiths-5@postgrad.manchester.ac.uk)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
19
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
21 This program also contains a quality-check method to improve the rate of PCR
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
22 success. For this QC method, paired end reads are assembled using
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
23 PANDAseq so you must have PANDAseq installed.
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
24
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
25 For the paper referencing this assembly-QC method see Fox et al.
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
26 (unpublished as of 15/02/2016) (graeme.fox@manchester.ac.uk)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
27
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
28 For best results in PCR for marker development, I suggest enabling all the
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
29 filter options AND the assembly based QC
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
30
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
31 ~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
32 REQUIREMENTS
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
33
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
34 Must have Biopython installed (www.biopython.org).
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
35
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
36 If you with to perform the assembly QC step, you must have:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
37 PandaSeq (https://github.com/neufeld/pandaseq)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
38 PandaSeq must be in your $PATH / able to run from anywhere
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
39
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
40 ~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
41 REQUIRED OPTIONS
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
42
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
43 -i forward_paired_ends.fastQ
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
44 -j reverse_paired_ends.fastQ
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
45 -p pal_finder output - by default pal_finder names this "x_PAL_summary.txt"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
46
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
47 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
48 BY DEFAULT THIS PROGRAM DOES NOTHING. ENABLE SOME OF THE OPTIONS BELOW.
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
49 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
50 NON-REQUIRED OPTIONS
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
51
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
52 -assembly: turn on the pandaseq assembly QC step
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
53
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
54 -primers: filter microsatellite loci to just those which have primers designed
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
55
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
56 -occurrences: filter microsatellite loci to those with primers
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
57 which appear only once in the dataset
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
58
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
59 -rankmotifs: filter microsatellite loci to just those with perfect motifs.
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
60 Rank the output by size of motif (largest first)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
61
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
62 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
63 For repeat analysis, the following extra non-required options may be useful:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
64
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
65 Since PandaSeq Assembly, and fastq -> fasta conversion are slow, do them the
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
66 first time, generate the files and then skip either, or both steps with
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
67 the following:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
68
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
69 -a: skip assembly step
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
70 -c: skip fastq -> fasta conversion step
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
71
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
72 Just make sure to keep the assembled/converted files in the correct directory
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
73 with the correct filename(s)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
74
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
75 ~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
76 EXAMPLE USAGE:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
77
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
78 pal_filtery.py -i R1.fastq -j R2.fastq
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
79 -p pal_finder_output.tabular -primers -occurrences -rankmotifs -assembly
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
80
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
81 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
82 """
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
83 import Bio, subprocess, argparse, csv, os, re, time
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
84 from Bio import SeqIO
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
85 __version__ = "1.0.0"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
86 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
87 # Function List #
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
88 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
89 def ReverseComplement1(seq):
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
90 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
91 take a nucleotide sequence and reverse-complement it
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
92 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
93 seq_dict = {'A':'T','T':'A','G':'C','C':'G'}
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
94 return "".join([seq_dict[base] for base in reversed(seq)])
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
95
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
96 def fastq_to_fasta(input_file, wanted_set):
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
97 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
98 take a file in fastq format, convert to fasta format and filter on
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
99 the set of sequences that we want to keep
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
100 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
101 file_name = os.path.splitext(os.path.basename(input_file))[0]
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
102 with open(file_name + "_filtered.fasta", "w") as out:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
103 for record in SeqIO.parse(input_file, "fastq"):
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
104 ID = str(record.id)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
105 SEQ = str(record.seq)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
106 if ID in wanted_set:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
107 out.write(">" + ID + "\n" + SEQ + "\n")
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
108
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
109 def strip_barcodes(input_file, wanted_set):
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
110 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
111 take fastq data containing sequencing barcodes and strip the barcode
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
112 from each sequence. Filter on the set of sequences that we want to keep
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
113 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
114 file_name = os.path.splitext(os.path.basename(input_file))[0]
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
115 with open(file_name + "_adapters_removed.fasta", "w") as out:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
116 for record in SeqIO.parse(input_file, "fasta"):
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
117 match = re.search(r'\S*:', record.id)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
118 if match:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
119 correct = match.group().rstrip(":")
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
120 else:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
121 correct = str(record.id)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
122 SEQ = str(record.seq)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
123 if correct in wanted_set:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
124 out.write(">" + correct + "\n" + SEQ + "\n")
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
125
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
126 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
127 # MAIN PROGRAM #
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
128 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
129 print "\n~~~~~~~~~~"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
130 print "pal_filter"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
131 print "~~~~~~~~~~"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
132 print "Version: " + __version__
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
133 time.sleep(1)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
134 print "\nFind the optimum loci in your pal_finder output and increase "\
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
135 "the rate of successful microsatellite marker development"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
136 print "\nSee Griffiths et al. (currently unpublished) for more details......"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
137 time.sleep(2)
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
138
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
139 # Get values for all the required and optional arguments
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
140
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
141 parser = argparse.ArgumentParser(description='pal_filter')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
142 parser.add_argument('-i','--input1', help='Forward paired-end fastq file', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
143 required=True)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
144
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
145 parser.add_argument('-j','--input2', help='Reverse paired-end fastq file', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
146 required=True)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
147
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
148 parser.add_argument('-p','--pal_finder', help='Output from pal_finder ', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
149 required=True)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
150
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
151 parser.add_argument('-assembly', help='Perform the PandaSeq based QC', \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
152 action='store_true')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
153
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
154 parser.add_argument('-a','--skip_assembly', help='If the assembly has already \
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
155 been run, skip it with -a', action='store_true')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
156
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
157 parser.add_argument('-c','--skip_conversion', help='If the fastq to fasta \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
158 conversion has already been run, skip it with -c', \
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
159 action='store_true')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
160
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
161 parser.add_argument('-primers', help='Filter \
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
162 pal_finder output to just those loci which have primers \
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
163 designed', action='store_true')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
164
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
165 parser.add_argument('-occurrences', \
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
166 help='Filter pal_finder output to just loci with primers \
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
167 which only occur once in the dataset', action='store_true')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
168
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
169 parser.add_argument('-rankmotifs', \
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
170 help='Filter pal_finder output to just loci which are a \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
171 perfect repeat unit. Also, rank the loci by motif size \
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
172 (largest first)', action='store_true')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
173
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
174 parser.add_argument('-v', '--get_version', help='Print the version number of \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
175 this pal_filter script', action='store_true')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
176
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
177 args = parser.parse_args()
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
178
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
179 if not args.assembly and not args.primers and not args.occurrences \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
180 and not args.rankmotifs:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
181 print "\nNo optional arguments supplied."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
182 print "\nBy default this program does nothing."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
183 print "\nNo files produced and no modifications made."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
184 print "\nFinished.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
185 exit()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
186 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
187 print "\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
188 print "Checking supplied filtering parameters:"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
189 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
190 time.sleep(2)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
191 if args.get_version:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
192 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
193 print "pal_filter version is " + __version__ + " (03/03/2016)"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
194 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
195 if args.primers:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
196 print "-primers flag supplied."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
197 print "Filtering pal_finder output on the \"Primers found (1=y,0=n)\"" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
198 " column."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
199 print "Only rows where primers have successfully been designed will"\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
200 " pass the filter.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
201 time.sleep(2)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
202 if args.occurrences:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
203 print "-occurrences flag supplied."
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
204 print "Filtering pal_finder output on the \"Occurrences of Forward" \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
205 " Primer in Reads\" and \"Occurrences of Reverse Primer" \
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
206 " in Reads\" columns."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
207 print "Only rows where both primers occur only a single time in the"\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
208 " reads pass the filter.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
209 time.sleep(2)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
210 if args.rankmotifs:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
211 print "-rankmotifs flag supplied."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
212 print "Filtering pal_finder output on the \"Motifs(bases)\" column to" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
213 " just those with perfect repeats."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
214 print "Only rows containing 'perfect' repeats will pass the filter."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
215 print "Also, ranking output by size of motif (largest first).\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
216 time.sleep(2)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
217
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
218 # index the raw fastq files so that the sequences can be pulled out and
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
219 # added to the filtered output file
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
220 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
221 print "Indexing FastQ files....."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
222 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
223 R1fastq_sequences_index = SeqIO.index(args.input1,'fastq')
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
224 R2fastq_sequences_index = SeqIO.index(args.input2,'fastq')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
225 print "Indexing complete."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
226
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
227 # create a set to hold the filtered output
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
228 wanted_lines = set()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
229
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
230 # get lines from the pal_finder output which meet filter settings
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
231 # read the pal_finder output file into a csv reader
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
232 with open (args.pal_finder) as csvfile_infile:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
233 csv_f = csv.reader(csvfile_infile, delimiter='\t')
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
234 header = csv_f.next()
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
235 header.extend(("R1_Sequence_ID", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
236 "R1_Sequence", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
237 "R2_Sequence_ID", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
238 "R2_Sequence" + "\n"))
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
239 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
240 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
241 ".filtered", 'w') as csvfile_outfile:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
242 # write the header line for the output file
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
243 csvfile_outfile.write('\t'.join(header))
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
244 for row in csv_f:
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
245 # get the sequence ID
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
246 seq_ID = row[0]
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
247 # get the raw sequence reads and convert to a format that can
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
248 # go into a tsv file
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
249 R1_sequence = R1fastq_sequences_index[seq_ID].format("fasta").\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
250 replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
251 R2_sequence = R2fastq_sequences_index[seq_ID].format("fasta").\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
252 replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
253 seq_info = "\t" + R1_sequence + "\t" + R2_sequence + "\n"
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
254 # navigate through all different combinations of filter options
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
255 # if the primer filter is switched on
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
256 if args.primers:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
257 # check the occurrences of primers field
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
258 if row[5] == "1":
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
259 # if filter occurrences of primers is switched on
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
260 if args.occurrences:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
261 # check the occurrences of primers field
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
262 if (row[15] == "1" and row[16] == "1"):
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
263 # if rank by motif is switched on
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
264 if args.rankmotifs:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
265 # check for perfect motifs
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
266 if row[1].count('(') == 1:
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
267 # all 3 filter switched on
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
268 # write line out to output
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
269 csvfile_outfile.write('\t'.join(row) + \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
270 seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
271 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
272 csvfile_outfile.write('\t'.join(row) + seq_info)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
273 elif args.rankmotifs:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
274 if row[1].count('(') == 1:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
275 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
276 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
277 csvfile_outfile.write('\t'.join(row) + seq_info)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
278 elif args.occurrences:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
279 if (row[15] == "1" and row[16] == "1"):
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
280 if args.rankmotifs:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
281 if row[1].count('(') == 1:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
282 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
283 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
284 csvfile_outfile.write('\t'.join(row) + seq_info)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
285 elif args.rankmotifs:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
286 if row[1].count('(') == 1:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
287 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
288 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
289 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
290
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
291 # if filter_rank_motifs is active, order the file by the size of the motif
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
292 if args.rankmotifs:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
293 rank_motif = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
294 ranked_list = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
295 # read in the non-ordered file and add every entry to rank_motif list
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
296 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
297 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
298 ".filtered") as csvfile_ranksize:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
299 csv_rank = csv.reader(csvfile_ranksize, delimiter='\t')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
300 header = csv_rank.next()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
301 for line in csv_rank:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
302 rank_motif.append(line)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
303
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
304 # open the file ready to write the ordered list
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
305 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
306 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
307 ".filtered", 'w') as rank_outfile:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
308 rankwriter = csv.writer(rank_outfile, delimiter='\t', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
309 lineterminator='\n')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
310 rankwriter.writerow(header)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
311 count = 2
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
312 while count < 10:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
313 for row in rank_motif:
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
314 # count size of motif
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
315 motif = re.search(r'[ATCG]*',row[1])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
316 if motif:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
317 the_motif = motif.group()
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
318 # rank it and write into ranked_list
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
319 if len(the_motif) == count:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
320 ranked_list.insert(0, row)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
321 count = count + 1
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
322 # write out the ordered list, into the .filtered file
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
323 for row in ranked_list:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
324 rankwriter.writerow(row)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
325
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
326 print "\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
327 print "Checking assembly flags supplied:"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
328 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
329 if not args.assembly:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
330 print "Assembly flag not supplied. Not performing assembly QC.\n"
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
331 if args.assembly:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
332 print "-assembly flag supplied: Perform PandaSeq assembly quality checks."
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
333 print "See Fox et al. (currently unpublished) for full details on the"\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
334 " quality-check process.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
335 time.sleep(5)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
336
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
337 # Get readID, F primers, R primers and motifs from filtered pal_finder output
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
338 seqIDs = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
339 motif = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
340 F_primers = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
341 R_primers = []
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
342 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
343 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
344 ".filtered") as input_csv:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
345 pal_finder_csv = csv.reader(input_csv, delimiter='\t')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
346 header = pal_finder_csv.next()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
347 for row in pal_finder_csv:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
348 seqIDs.append(row[0])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
349 motif.append(row[1])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
350 F_primers.append(row[7])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
351 R_primers.append(row[9])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
352
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
353 # Get a list of just the unique IDs we want
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
354 wanted = set()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
355 for line in seqIDs:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
356 wanted.add(line)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
357 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
358 Assemble the paired end reads into overlapping contigs using PandaSeq
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
359 (can be skipped with the -a flag if assembly has already been run
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
360 and the appropriate files are in the same directory as the script,
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
361 and named "Assembly.fasta" and "Assembly_adapters_removed.fasta")
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
362
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
363 The first time you riun the script you MUST not enable the -a flag.t
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
364 but you are able to skip the assembly in subsequent analysis using the
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
365 -a flag.
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
366 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
367 if not args.skip_assembly:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
368 pandaseq_command = 'pandaseq -A pear -f ' + args.input1 + ' -r ' + \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
369 args.input2 + ' -o 25 -t 0.95 -w Assembly.fasta'
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
370 subprocess.call(pandaseq_command, shell=True)
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
371 strip_barcodes("Assembly.fasta", wanted)
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
372 print "\nPaired end reads been assembled into overlapping reads."
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
373 print "\nFor future analysis, you can skip this assembly step using" \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
374 " the -a flag, provided that the assembly.fasta file" \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
375 " is intact and in the same location."
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
376 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
377 print "\n(Skipping the assembly step as you provided the -a flag)"
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
378 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
379 Fastq files need to be converted to fasta. The first time you run the script
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
380 you MUST not enable the -c flag, but you are able to skip the conversion
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
381 later using the -c flag. Make sure the fasta files are in the same location
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
382 and do not change the filenames
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
383 """
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
384 if not args.skip_conversion:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
385 fastq_to_fasta(args.input1, wanted)
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
386 fastq_to_fasta(args.input2, wanted)
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
387 print "\nThe input fastq files have been converted to the fasta format."
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
388 print "\nFor any future analysis, you can skip this conversion step" \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
389 " using the -c flag, provided that the fasta files" \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
390 " are intact and in the same location."
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
391 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
392 print "\n(Skipping the fastq -> fasta conversion as you provided the" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
393 " -c flag).\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
394
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
395 # get the files and everything else needed
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
396 # Assembled fasta file
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
397 assembly_file = "Assembly_adapters_removed.fasta"
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
398 # filtered R1 reads
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
399 R1_fasta = os.path.splitext( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
400 os.path.basename(args.input1))[0] + "_filtered.fasta"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
401 # filtered R2 reads
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
402 R2_fasta = os.path.splitext( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
403 os.path.basename(args.input2))[0] + "_filtered.fasta"
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
404 outputfilename = os.path.splitext(os.path.basename(args.input1))[0]
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
405 # parse the files with SeqIO
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
406 assembly_sequences = SeqIO.parse(assembly_file,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
407 R1fasta_sequences = SeqIO.parse(R1_fasta,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
408
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
409 # create some empty lists to hold the ID tags we are interested in
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
410 assembly_IDs = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
411 fasta_IDs = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
412
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
413 # populate the above lists with sequence IDs
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
414 for sequence in assembly_sequences:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
415 assembly_IDs.append(sequence.id)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
416 for sequence in R1fasta_sequences:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
417 fasta_IDs.append(sequence.id)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
418
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
419 # Index the assembly fasta file
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
420 assembly_sequences_index = SeqIO.index(assembly_file,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
421 R1fasta_sequences_index = SeqIO.index(R1_fasta,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
422 R2fasta_sequences_index = SeqIO.index(R2_fasta,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
423
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
424 # prepare the output file
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
425 with open ( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
426 outputfilename + "_pal_filter_assembly_output.txt", 'w') \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
427 as outputfile:
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
428 # write the headers for the output file
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
429 output_header = ("readPairID", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
430 "Forward Primer",\
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
431 "F Primer Position in Assembled Read", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
432 "Reverse Primer", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
433 "R Primer Position in Assembled Read", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
434 "Motifs(bases)", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
435 "Assembled Read ID", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
436 "Assembled Read Sequence", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
437 "Raw Forward Read ID", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
438 "Raw Forward Read Sequence", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
439 "Raw Reverse Read ID", \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
440 "Raw Reverse Read Sequence\n")
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
441 outputfile.write("\t".join(output_header))
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
442
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
443 # cycle through parameters from the pal_finder output
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
444 for x, y, z, a in zip(seqIDs, F_primers, R_primers, motif):
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
445 if str(x) in assembly_IDs:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
446 # get the raw sequences ready to go into the output file
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
447 assembly_seq = (assembly_sequences_index.get_raw(x).decode())
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
448 # fasta entries need to be converted to single line so sit
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
449 # nicely in the output
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
450 assembly_output = assembly_seq.replace("\n","\t").strip('\t')
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
451 R1_fasta_seq = (R1fasta_sequences_index.get_raw(x).decode())
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
452 R1_output = R1_fasta_seq.replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
453 R2_fasta_seq = (R2fasta_sequences_index.get_raw(x).decode())
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
454 R2_output = R2_fasta_seq.replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
455 assembly_no_id = '\n'.join(assembly_seq.split('\n')[1:])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
456
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
457 # check that both primer sequences can be seen in the
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
458 # assembled contig
5
8159dab5dbdb Bug fix to pal_filter.py.
pjbriggs
parents: 4
diff changeset
459 if ((y in assembly_no_id) or \
8159dab5dbdb Bug fix to pal_filter.py.
pjbriggs
parents: 4
diff changeset
460 (ReverseComplement1(y) in assembly_no_id)) and \
8159dab5dbdb Bug fix to pal_filter.py.
pjbriggs
parents: 4
diff changeset
461 ((z in assembly_no_id) or \
8159dab5dbdb Bug fix to pal_filter.py.
pjbriggs
parents: 4
diff changeset
462 (ReverseComplement1(z) in assembly_no_id)):
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
463 if y in assembly_no_id:
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
464 # get the positions of the primers in the assembly
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
465 # (can be used to predict fragment length)
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
466 F_position = assembly_no_id.index(y)+len(y)+1
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
467 if ReverseComplement1(y) in assembly_no_id:
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
468 F_position = assembly_no_id.index( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
469 ReverseComplement1(y))+len(ReverseComplement1(y))+1
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
470 if z in assembly_no_id:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
471 R_position = assembly_no_id.index(z)+1
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
472 if ReverseComplement1(z) in assembly_no_id:
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
473 R_position = assembly_no_id.index( \
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
474 ReverseComplement1(z))+1
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
475 output = (str(x),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
476 str(y),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
477 str(F_position),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
478 str(z),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
479 str(R_position),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
480 str(a),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
481 str(assembly_output),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
482 str(R1_output),
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
483 str(R2_output + "\n"))
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
484 outputfile.write("\t".join(output))
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
485 print "\nPANDAseq quality check complete."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
486 print "Results from PANDAseq quality check (and filtering, if any" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
487 " any filters enabled) written to output file" \
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
488 " ending \"_pal_filter_assembly_output.txt\".\n"
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
489 print "Filtering of pal_finder results complete."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
490 print "Filtered results written to output file ending \".filtered\"."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
491 print "\nFinished\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
492 else:
4
cb56cc1d5c39 Updates to the palfilter.py utility.
pjbriggs
parents: 3
diff changeset
493 if args.skip_assembly or args.skip_conversion:
3
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
494 print "\nERROR: You cannot supply the -a flag or the -c flag without \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
495 also supplying the -assembly flag.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script)
pjbriggs
parents:
diff changeset
496 print "\nProgram Finished\n"