annotate tools/primers/seq_primer_clip.py.orig @ 1:8c02a91a8680 draft

Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
author peterjc
date Tue, 30 Apr 2013 11:04:43 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
2 """Looks for the given primer sequences and clips matching SFF reads.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
3
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
4 Takes eight command line options, input read filename, input read format,
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
5 input primer FASTA filename, type of primers (forward, reverse or reverse-
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
6 complement), number of mismatches (currently only 0, 1 and 2 are supported),
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
7 minimum length to keep a read (after primer trimming), should primer-less
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
8 reads be kept (boolean), and finally the output sequence filename.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
9
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
10 Both the primer and read sequences can contain IUPAC ambiguity codes like N.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
11
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
12 This supports FASTA, FASTQ and SFF sequence files. Colorspace reads are not
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
13 supported.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
14
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
15 The mismatch parameter does not consider gapped alignemnts, however the
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
16 special case of missing bases at the very start or end of the read is handled.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
17 e.g. a primer sequence CCGACTCGAG will match a read starting CGACTCGAG...
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
18 if one or more mismatches are allowed.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
19
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
20 This can also be used for stripping off (and optionally filtering on) barcodes.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
21
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
22 Note that only the trim/clip values in the SFF file are changed, not the flow
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
23 information of the full read sequence.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
24
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
25 This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
26 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
27 See accompanying text file for licence details (MIT/BSD style).
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
28
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
29 This is version 0.0.8 of the script. Currently it uses Python's regular
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
30 expression engine for finding the primers, which for my needs is fast enough.
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
31 """
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
32 import sys
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
33 import re
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
34 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
35 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
36
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
37 if "-v" in sys.argv or "--version" in sys.argv:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
38 print "v0.0.5"
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
39 sys.exit(0)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
40
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
41 def stop_err(msg, err=1):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
42 sys.stderr.write(msg)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
43 sys.exit(err)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
44
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
45 try:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
46 from Bio.Seq import reverse_complement
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
47 from Bio.SeqIO.SffIO import SffIterator, SffWriter
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
48 except ImportError:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
49 stop_err("Requires Biopython 1.54 or later")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
50 try:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
51 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
52 except ImportError:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
53 #Prior to Biopython 1.56 this was a private function
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
54 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
55
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
56 #Parse Command Line
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
57 try:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
58 in_file, seq_format, primer_fasta, primer_type, mm, min_len, keep_negatives, out_file = sys.argv[1:]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
59 except ValueError:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
60 stop_err("Expected 8 arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
61
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
62 if in_file == primer_fasta:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
63 stop_err("Same file given as both primer sequences and sequences to clip!")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
64 if in_file == out_file:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
65 stop_err("Same file given as both sequences to clip and output!")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
66 if primer_fasta == out_file:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
67 stop_err("Same file given as both primer sequences and output!")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
68
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
69 try:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
70 mm = int(mm)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
71 except ValueError:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
72 stop_err("Expected non-negative integer number of mismatches (e.g. 0 or 1), not %r" % mm)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
73 if mm < 0:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
74 stop_err("Expected non-negtive integer number of mismatches (e.g. 0 or 1), not %r" % mm)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
75 if mm not in [0,1,2]:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
76 raise NotImplementedError
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
77
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
78 try:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
79 min_len = int(min_len)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
80 except ValueError:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
81 stop_err("Expected non-negative integer min_len (e.g. 0 or 1), not %r" % min_len)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
82 if min_len < 0:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
83 stop_err("Expected non-negtive integer min_len (e.g. 0 or 1), not %r" % min_len)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
84
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
85
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
86 if keep_negatives.lower() in ["true", "yes", "on"]:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
87 keep_negatives = True
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
88 elif keep_negatives.lower() in ["false", "no", "off"]:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
89 keep_negatives = False
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
90 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
91 stop_err("Expected boolean for keep_negatives (e.g. true or false), not %r" % keep_negatives)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
92
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
93
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
94 if primer_type.lower() == "forward":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
95 forward = True
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
96 rc = False
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
97 elif primer_type.lower() == "reverse":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
98 forward = False
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
99 rc = False
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
100 elif primer_type.lower() == "reverse-complement":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
101 forward = False
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
102 rc = True
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
103 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
104 stop_err("Expected foward, reverse or reverse-complement not %r" % primer_type)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
105
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
106
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
107 ambiguous_dna_values = {
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
108 "A": "A",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
109 "C": "C",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
110 "G": "G",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
111 "T": "T",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
112 "M": "ACM",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
113 "R": "AGR",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
114 "W": "ATW",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
115 "S": "CGS",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
116 "Y": "CTY",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
117 "K": "GTK",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
118 "V": "ACGMRSV",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
119 "H": "ACTMWYH",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
120 "D": "AGTRWKD",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
121 "B": "CGTSYKB",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
122 "X": ".", #faster than [GATCMRWSYKVVHDBXN] or even [GATC]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
123 "N": ".",
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
124 }
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
125
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
126 ambiguous_dna_re = {}
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
127 for letter, values in ambiguous_dna_values.iteritems():
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
128 if len(values) == 1:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
129 ambiguous_dna_re[letter] = values
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
130 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
131 ambiguous_dna_re[letter] = "[%s]" % values
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
132
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
133
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
134 def make_reg_ex(seq):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
135 return "".join(ambiguous_dna_re[letter] for letter in seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
136
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
137 def make_reg_ex_mm(seq, mm):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
138 if mm > 2:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
139 raise NotImplementedError("At most 2 mismatches allowed!")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
140 seq = seq.upper()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
141 yield make_reg_ex(seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
142 for i in range(1,mm+1):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
143 #Missing first/last i bases at very start/end of sequence
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
144 for reg in make_reg_ex_mm(seq[i:], mm-i):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
145 yield "^" + reg
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
146 for reg in make_reg_ex_mm(seq[:-i], mm-i):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
147 yield "$" + reg
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
148 if mm >= 1:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
149 for i,letter in enumerate(seq):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
150 #We'll use a set to remove any duplicate patterns
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
151 #if letter not in "NX":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
152 pattern = seq[:i] + "N" + seq[i+1:]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
153 assert len(pattern) == len(seq), "Len %s is %i, len %s is %i" \
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
154 % (pattern, len(pattern), seq, len(seq))
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
155 yield make_reg_ex(pattern)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
156 if mm >=2:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
157 for i,letter in enumerate(seq):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
158 #We'll use a set to remove any duplicate patterns
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
159 #if letter not in "NX":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
160 for k,letter in enumerate(seq[i+1:]):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
161 #We'll use a set to remove any duplicate patterns
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
162 #if letter not in "NX":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
163 pattern = seq[:i] + "N" + seq[i+1:i+1+k] + "N" + seq[i+k+2:]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
164 assert len(pattern) == len(seq), "Len %s is %i, len %s is %i" \
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
165 % (pattern, len(pattern), seq, len(seq))
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
166 yield make_reg_ex(pattern)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
167
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
168 def load_primers_as_re(primer_fasta, mm, rc=False):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
169 #Read primer file and record all specified sequences
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
170 primers = set()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
171 in_handle = open(primer_fasta, "rU")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
172 reader = fastaReader(in_handle)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
173 count = 0
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
174 for record in reader:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
175 if rc:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
176 seq = reverse_complement(record.sequence)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
177 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
178 seq = record.sequence
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
179 #primers.add(re.compile(make_reg_ex(seq)))
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
180 count += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
181 for pattern in make_reg_ex_mm(seq, mm):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
182 primers.add(pattern)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
183 in_handle.close()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
184 #Use set to avoid duplicates, sort to have longest first
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
185 #(so more specific primers found before less specific ones)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
186 primers = sorted(set(primers), key=lambda p: -len(p))
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
187 return count, re.compile("|".join(primers)) #make one monster re!
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
188
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
189
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
190
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
191 #Read primer file and record all specified sequences
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
192 count, primer = load_primers_as_re(primer_fasta, mm, rc)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
193 print "%i primer sequences" % count
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
194
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
195 short_neg = 0
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
196 short_clipped = 0
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
197 clipped = 0
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
198 negs = 0
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
199
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
200 if seq_format.lower()=="sff":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
201 #SFF is different because we just change the trim points
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
202 if forward:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
203 def process(records):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
204 global short_clipped, short_neg, clipped, negs
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
205 for record in records:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
206 left_clip = record.annotations["clip_qual_left"]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
207 right_clip = record.annotations["clip_qual_right"]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
208 seq = str(record.seq)[left_clip:right_clip].upper()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
209 result = primer.search(seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
210 if result:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
211 #Forward primer, take everything after it
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
212 #so move the left clip along
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
213 if len(seq) - result.end() >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
214 record.annotations["clip_qual_left"] = left_clip + result.end()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
215 clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
216 yield record
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
217 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
218 short_clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
219 elif keep_negatives:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
220 if len(seq) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
221 negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
222 yield record
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
223 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
224 short_neg += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
225 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
226 def process(records):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
227 global short_clipped, short_neg, clipped, negs
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
228 for record in records:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
229 left_clip = record.annotations["clip_qual_left"]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
230 right_clip = record.annotations["clip_qual_right"]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
231 seq = str(record.seq)[left_clip:right_clip].upper()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
232 result = primer.search(seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
233 if result:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
234 #Reverse primer, take everything before it
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
235 #so move the right clip back
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
236 new_len = result.start()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
237 if new_len >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
238 record.annotations["clip_qual_right"] = left_clip + new_len
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
239 clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
240 yield record
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
241 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
242 short_clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
243 elif keep_negatives:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
244 if len(seq) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
245 negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
246 yield record
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
247 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
248 short_neg += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
249
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
250 in_handle = open(in_file, "rb")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
251 try:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
252 manifest = ReadRocheXmlManifest(in_handle)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
253 except ValueError:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
254 manifest = None
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
255 in_handle.seek(0)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
256 out_handle = open(out_file, "wb")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
257 writer = SffWriter(out_handle, xml=manifest)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
258 writer.write_file(process(SffIterator(in_handle)))
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
259 #End of SFF code
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
260 elif seq_format.lower().startswith("fastq"):
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
261 in_handle = open(in_file, "rU")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
262 out_handle = open(out_file, "w")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
263 reader = fastqReader(in_handle)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
264 writer = fastqWriter(out_handle)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
265 if forward:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
266 for record in reader:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
267 seq = record.sequence.upper()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
268 result = primer.search(seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
269 if result:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
270 #Forward primer, take everything after it
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
271 cut = result.end()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
272 record.sequence = seq[cut:]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
273 if len(record.sequence) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
274 record.quality = record.quality[cut:]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
275 clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
276 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
277 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
278 short_clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
279 elif keep_negatives:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
280 if len(record) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
281 negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
282 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
283 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
284 short_negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
285 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
286 for record in reader:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
287 seq = record.sequence.upper()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
288 result = primer.search(seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
289 if result:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
290 #Reverse primer, take everything before it
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
291 cut = result.start()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
292 record.sequence = seq[:cut]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
293 if len(record.sequence) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
294 record.quality = record.quality[:cut]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
295 clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
296 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
297 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
298 short_clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
299 elif keep_negatives:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
300 if len(record) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
301 negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
302 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
303 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
304 short_negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
305 elif seq_format.lower()=="fasta":
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
306 in_handle = open(in_file, "rU")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
307 out_handle = open(out_file, "w")
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
308 reader = fastaReader(in_handle)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
309 writer = fastaWriter(out_handle)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
310 #Following code is identical to that for FASTQ but without editing qualities
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
311 if forward:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
312 for record in reader:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
313 seq = record.sequence.upper()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
314 result = primer.search(seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
315 if result:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
316 #Forward primer, take everything after it
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
317 cut = result.end()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
318 record.sequence = seq[cut:]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
319 if len(record.sequence) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
320 clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
321 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
322 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
323 short_clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
324 elif keep_negatives:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
325 if len(record) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
326 negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
327 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
328 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
329 short_negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
330 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
331 for record in reader:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
332 seq = record.sequence.upper()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
333 result = primer.search(seq)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
334 if result:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
335 #Reverse primer, take everything before it
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
336 cut = result.start()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
337 record.sequence = seq[:cut]
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
338 if len(record.sequence) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
339 clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
340 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
341 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
342 short_clipped += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
343 elif keep_negatives:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
344 if len(record) >= min_len:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
345 negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
346 writer.write(record)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
347 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
348 short_negs += 1
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
349 else:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
350 stop_err("Unsupported file type %r" % seq_format)
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
351 in_handle.close()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
352 out_handle.close()
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
353
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
354 print "Kept %i clipped reads," % clipped
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
355 print "discarded %i short." % short_clipped
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
356 if keep_negatives:
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
357 print "Kept %i non-matching reads," % negs
8c02a91a8680 Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
358 print "discarded %i short." % short_neg