annotate tools/seq_primer_clip/seq_primer_clip.py @ 2:ee5acea162a7 draft

Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
author peterjc
date Thu, 24 Oct 2013 09:37:25 -0400
parents
children 9b074c1db68e
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
2 """Looks for the given primer sequences and clips matching SFF reads.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
3
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
4 Takes eight command line options, input read filename, input read format,
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
5 input primer FASTA filename, type of primers (forward, reverse or reverse-
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
6 complement), number of mismatches (currently only 0, 1 and 2 are supported),
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
7 minimum length to keep a read (after primer trimming), should primer-less
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
8 reads be kept (boolean), and finally the output sequence filename.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
9
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
10 Both the primer and read sequences can contain IUPAC ambiguity codes like N.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
11
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
12 This supports FASTA, FASTQ and SFF sequence files. Colorspace reads are not
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
13 supported.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
14
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
15 The mismatch parameter does not consider gapped alignemnts, however the
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
16 special case of missing bases at the very start or end of the read is handled.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
17 e.g. a primer sequence CCGACTCGAG will match a read starting CGACTCGAG...
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
18 if one or more mismatches are allowed.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
19
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
20 This can also be used for stripping off (and optionally filtering on) barcodes.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
21
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
22 Note that only the trim/clip values in the SFF file are changed, not the flow
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
23 information of the full read sequence.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
24
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
25 This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
26 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
27 See accompanying text file for licence details (MIT/BSD style).
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
28
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
29 This is version 0.0.8 of the script. Currently it uses Python's regular
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
30 expression engine for finding the primers, which for my needs is fast enough.
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
31 """
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
32 import sys
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
33 import re
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
34 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
35 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
36
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
37 if "-v" in sys.argv or "--version" in sys.argv:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
38 print "v0.0.5"
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
39 sys.exit(0)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
40
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
41 def stop_err(msg, err=1):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
42 sys.stderr.write(msg)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
43 sys.exit(err)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
44
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
45 try:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
46 from Bio.Seq import reverse_complement
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
47 from Bio.SeqIO.SffIO import SffIterator, SffWriter
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
48 except ImportError:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
49 stop_err("Requires Biopython 1.54 or later")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
50 try:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
51 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
52 except ImportError:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
53 #Prior to Biopython 1.56 this was a private function
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
54 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
55
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
56 #Parse Command Line
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
57 try:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
58 in_file, seq_format, primer_fasta, primer_type, mm, min_len, keep_negatives, out_file = sys.argv[1:]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
59 except ValueError:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
60 stop_err("Expected 8 arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
61
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
62 if in_file == primer_fasta:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
63 stop_err("Same file given as both primer sequences and sequences to clip!")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
64 if in_file == out_file:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
65 stop_err("Same file given as both sequences to clip and output!")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
66 if primer_fasta == out_file:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
67 stop_err("Same file given as both primer sequences and output!")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
68
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
69 try:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
70 mm = int(mm)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
71 except ValueError:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
72 stop_err("Expected non-negative integer number of mismatches (e.g. 0 or 1), not %r" % mm)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
73 if mm < 0:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
74 stop_err("Expected non-negtive integer number of mismatches (e.g. 0 or 1), not %r" % mm)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
75 if mm not in [0,1,2]:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
76 raise NotImplementedError
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
77
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
78 try:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
79 min_len = int(min_len)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
80 except ValueError:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
81 stop_err("Expected non-negative integer min_len (e.g. 0 or 1), not %r" % min_len)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
82 if min_len < 0:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
83 stop_err("Expected non-negtive integer min_len (e.g. 0 or 1), not %r" % min_len)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
84
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
85
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
86 if keep_negatives.lower() in ["true", "yes", "on"]:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
87 keep_negatives = True
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
88 elif keep_negatives.lower() in ["false", "no", "off"]:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
89 keep_negatives = False
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
90 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
91 stop_err("Expected boolean for keep_negatives (e.g. true or false), not %r" % keep_negatives)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
92
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
93
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
94 if primer_type.lower() == "forward":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
95 forward = True
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
96 rc = False
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
97 elif primer_type.lower() == "reverse":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
98 forward = False
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
99 rc = False
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
100 elif primer_type.lower() == "reverse-complement":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
101 forward = False
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
102 rc = True
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
103 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
104 stop_err("Expected foward, reverse or reverse-complement not %r" % primer_type)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
105
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
106
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
107 ambiguous_dna_values = {
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
108 "A": "A",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
109 "C": "C",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
110 "G": "G",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
111 "T": "T",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
112 "M": "ACM",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
113 "R": "AGR",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
114 "W": "ATW",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
115 "S": "CGS",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
116 "Y": "CTY",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
117 "K": "GTK",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
118 "V": "ACGMRSV",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
119 "H": "ACTMWYH",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
120 "D": "AGTRWKD",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
121 "B": "CGTSYKB",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
122 "X": ".", #faster than [GATCMRWSYKVVHDBXN] or even [GATC]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
123 "N": ".",
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
124 }
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
125
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
126 ambiguous_dna_re = {}
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
127 for letter, values in ambiguous_dna_values.iteritems():
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
128 if len(values) == 1:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
129 ambiguous_dna_re[letter] = values
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
130 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
131 ambiguous_dna_re[letter] = "[%s]" % values
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
132
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
133
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
134 def make_reg_ex(seq):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
135 return "".join(ambiguous_dna_re[letter] for letter in seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
136
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
137 def make_reg_ex_mm(seq, mm):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
138 if mm > 2:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
139 raise NotImplementedError("At most 2 mismatches allowed!")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
140 seq = seq.upper()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
141 yield make_reg_ex(seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
142 for i in range(1,mm+1):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
143 #Missing first/last i bases at very start/end of sequence
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
144 for reg in make_reg_ex_mm(seq[i:], mm-i):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
145 yield "^" + reg
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
146 for reg in make_reg_ex_mm(seq[:-i], mm-i):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
147 yield "$" + reg
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
148 if mm >= 1:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
149 for i,letter in enumerate(seq):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
150 #We'll use a set to remove any duplicate patterns
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
151 #if letter not in "NX":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
152 pattern = seq[:i] + "N" + seq[i+1:]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
153 assert len(pattern) == len(seq), "Len %s is %i, len %s is %i" \
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
154 % (pattern, len(pattern), seq, len(seq))
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
155 yield make_reg_ex(pattern)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
156 if mm >=2:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
157 for i,letter in enumerate(seq):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
158 #We'll use a set to remove any duplicate patterns
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
159 #if letter not in "NX":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
160 for k,letter in enumerate(seq[i+1:]):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
161 #We'll use a set to remove any duplicate patterns
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
162 #if letter not in "NX":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
163 pattern = seq[:i] + "N" + seq[i+1:i+1+k] + "N" + seq[i+k+2:]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
164 assert len(pattern) == len(seq), "Len %s is %i, len %s is %i" \
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
165 % (pattern, len(pattern), seq, len(seq))
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
166 yield make_reg_ex(pattern)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
167
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
168 def load_primers_as_re(primer_fasta, mm, rc=False):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
169 #Read primer file and record all specified sequences
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
170 primers = set()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
171 in_handle = open(primer_fasta, "rU")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
172 reader = fastaReader(in_handle)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
173 count = 0
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
174 for record in reader:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
175 if rc:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
176 seq = reverse_complement(record.sequence)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
177 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
178 seq = record.sequence
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
179 #primers.add(re.compile(make_reg_ex(seq)))
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
180 count += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
181 for pattern in make_reg_ex_mm(seq, mm):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
182 primers.add(pattern)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
183 in_handle.close()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
184 #Use set to avoid duplicates, sort to have longest first
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
185 #(so more specific primers found before less specific ones)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
186 primers = sorted(set(primers), key=lambda p: -len(p))
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
187 return count, re.compile("|".join(primers)) #make one monster re!
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
188
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
189
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
190
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
191 #Read primer file and record all specified sequences
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
192 count, primer = load_primers_as_re(primer_fasta, mm, rc)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
193 print "%i primer sequences" % count
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
194
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
195 short_neg = 0
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
196 short_clipped = 0
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
197 clipped = 0
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
198 negs = 0
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
199
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
200 if seq_format.lower()=="sff":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
201 #SFF is different because we just change the trim points
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
202 if forward:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
203 def process(records):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
204 global short_clipped, short_neg, clipped, negs
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
205 for record in records:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
206 left_clip = record.annotations["clip_qual_left"]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
207 right_clip = record.annotations["clip_qual_right"]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
208 seq = str(record.seq)[left_clip:right_clip].upper()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
209 result = primer.search(seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
210 if result:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
211 #Forward primer, take everything after it
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
212 #so move the left clip along
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
213 if len(seq) - result.end() >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
214 record.annotations["clip_qual_left"] = left_clip + result.end()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
215 clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
216 yield record
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
217 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
218 short_clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
219 elif keep_negatives:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
220 if len(seq) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
221 negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
222 yield record
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
223 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
224 short_neg += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
225 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
226 def process(records):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
227 global short_clipped, short_neg, clipped, negs
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
228 for record in records:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
229 left_clip = record.annotations["clip_qual_left"]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
230 right_clip = record.annotations["clip_qual_right"]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
231 seq = str(record.seq)[left_clip:right_clip].upper()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
232 result = primer.search(seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
233 if result:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
234 #Reverse primer, take everything before it
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
235 #so move the right clip back
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
236 new_len = result.start()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
237 if new_len >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
238 record.annotations["clip_qual_right"] = left_clip + new_len
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
239 clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
240 yield record
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
241 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
242 short_clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
243 elif keep_negatives:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
244 if len(seq) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
245 negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
246 yield record
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
247 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
248 short_neg += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
249
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
250 in_handle = open(in_file, "rb")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
251 try:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
252 manifest = ReadRocheXmlManifest(in_handle)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
253 except ValueError:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
254 manifest = None
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
255 in_handle.seek(0)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
256 out_handle = open(out_file, "wb")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
257 writer = SffWriter(out_handle, xml=manifest)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
258 writer.write_file(process(SffIterator(in_handle)))
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
259 #End of SFF code
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
260 elif seq_format.lower().startswith("fastq"):
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
261 in_handle = open(in_file, "rU")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
262 out_handle = open(out_file, "w")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
263 reader = fastqReader(in_handle)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
264 writer = fastqWriter(out_handle)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
265 if forward:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
266 for record in reader:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
267 seq = record.sequence.upper()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
268 result = primer.search(seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
269 if result:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
270 #Forward primer, take everything after it
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
271 cut = result.end()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
272 record.sequence = seq[cut:]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
273 if len(record.sequence) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
274 record.quality = record.quality[cut:]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
275 clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
276 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
277 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
278 short_clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
279 elif keep_negatives:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
280 if len(record) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
281 negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
282 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
283 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
284 short_negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
285 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
286 for record in reader:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
287 seq = record.sequence.upper()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
288 result = primer.search(seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
289 if result:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
290 #Reverse primer, take everything before it
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
291 cut = result.start()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
292 record.sequence = seq[:cut]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
293 if len(record.sequence) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
294 record.quality = record.quality[:cut]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
295 clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
296 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
297 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
298 short_clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
299 elif keep_negatives:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
300 if len(record) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
301 negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
302 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
303 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
304 short_negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
305 elif seq_format.lower()=="fasta":
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
306 in_handle = open(in_file, "rU")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
307 out_handle = open(out_file, "w")
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
308 reader = fastaReader(in_handle)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
309 writer = fastaWriter(out_handle)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
310 #Following code is identical to that for FASTQ but without editing qualities
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
311 if forward:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
312 for record in reader:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
313 seq = record.sequence.upper()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
314 result = primer.search(seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
315 if result:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
316 #Forward primer, take everything after it
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
317 cut = result.end()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
318 record.sequence = seq[cut:]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
319 if len(record.sequence) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
320 clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
321 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
322 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
323 short_clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
324 elif keep_negatives:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
325 if len(record) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
326 negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
327 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
328 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
329 short_negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
330 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
331 for record in reader:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
332 seq = record.sequence.upper()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
333 result = primer.search(seq)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
334 if result:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
335 #Reverse primer, take everything before it
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
336 cut = result.start()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
337 record.sequence = seq[:cut]
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
338 if len(record.sequence) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
339 clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
340 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
341 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
342 short_clipped += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
343 elif keep_negatives:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
344 if len(record) >= min_len:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
345 negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
346 writer.write(record)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
347 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
348 short_negs += 1
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
349 else:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
350 stop_err("Unsupported file type %r" % seq_format)
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
351 in_handle.close()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
352 out_handle.close()
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
353
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
354 print "Kept %i clipped reads," % clipped
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
355 print "discarded %i short." % short_clipped
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
356 if keep_negatives:
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
357 print "Kept %i non-matching reads," % negs
ee5acea162a7 Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff changeset
358 print "discarded %i short." % short_neg