Mercurial > repos > peterjc > seq_primer_clip
annotate tools/seq_primer_clip/seq_primer_clip.py @ 5:530c8d6fedd8 draft
v0.0.15 - internal changes
author | peterjc |
---|---|
date | Wed, 10 May 2017 13:09:52 -0400 |
parents | 9b074c1db68e |
children | b9dc7c967ee6 |
rev | line source |
---|---|
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
2 """Looks for the given primer sequences and clips matching SFF reads. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
3 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
4 Takes eight command line options, input read filename, input read format, |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
5 input primer FASTA filename, type of primers (forward, reverse or reverse- |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
6 complement), number of mismatches (currently only 0, 1 and 2 are supported), |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
7 minimum length to keep a read (after primer trimming), should primer-less |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
8 reads be kept (boolean), and finally the output sequence filename. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
9 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
10 Both the primer and read sequences can contain IUPAC ambiguity codes like N. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
11 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
12 This supports FASTA, FASTQ and SFF sequence files. Colorspace reads are not |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
13 supported. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
14 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
15 The mismatch parameter does not consider gapped alignemnts, however the |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
16 special case of missing bases at the very start or end of the read is handled. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
17 e.g. a primer sequence CCGACTCGAG will match a read starting CGACTCGAG... |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
18 if one or more mismatches are allowed. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
19 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
20 This can also be used for stripping off (and optionally filtering on) barcodes. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
21 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
22 Note that only the trim/clip values in the SFF file are changed, not the flow |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
23 information of the full read sequence. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
24 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
25 This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
26 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved. |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
27 See accompanying text file for licence details (MIT/BSD style). |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
28 |
4 | 29 NOTE: Currently it uses Python's regular expression engine for finding the |
30 primers, which for my needs is fast enough. | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
31 """ |
5 | 32 |
33 import re | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
34 import sys |
5 | 35 |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
36 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
37 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
38 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
39 if "-v" in sys.argv or "--version" in sys.argv: |
4 | 40 print "v0.0.12" |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
41 sys.exit(0) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
42 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
43 try: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
44 from Bio.Seq import reverse_complement |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
45 from Bio.SeqIO.SffIO import SffIterator, SffWriter |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
46 except ImportError: |
4 | 47 sys.exit("Requires Biopython 1.54 or later") |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
48 try: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
49 from Bio.SeqIO.SffIO import ReadRocheXmlManifest |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
50 except ImportError: |
4 | 51 # Prior to Biopython 1.56 this was a private function |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
52 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
53 |
4 | 54 # Parse Command Line |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
55 try: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
56 in_file, seq_format, primer_fasta, primer_type, mm, min_len, keep_negatives, out_file = sys.argv[1:] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
57 except ValueError: |
4 | 58 sys.exit("Expected 8 arguments, got %i:\n%s" % (len(sys.argv) - 1, " ".join(sys.argv))) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
59 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
60 if in_file == primer_fasta: |
4 | 61 sys.exit("Same file given as both primer sequences and sequences to clip!") |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
62 if in_file == out_file: |
4 | 63 sys.exit("Same file given as both sequences to clip and output!") |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
64 if primer_fasta == out_file: |
4 | 65 sys.exit("Same file given as both primer sequences and output!") |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
66 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
67 try: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
68 mm = int(mm) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
69 except ValueError: |
4 | 70 sys.exit("Expected non-negative integer number of mismatches (e.g. 0 or 1), not %r" % mm) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
71 if mm < 0: |
4 | 72 sys.exit("Expected non-negtive integer number of mismatches (e.g. 0 or 1), not %r" % mm) |
73 if mm not in [0, 1, 2]: | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
74 raise NotImplementedError |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
75 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
76 try: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
77 min_len = int(min_len) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
78 except ValueError: |
4 | 79 sys.exit("Expected non-negative integer min_len (e.g. 0 or 1), not %r" % min_len) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
80 if min_len < 0: |
4 | 81 sys.exit("Expected non-negtive integer min_len (e.g. 0 or 1), not %r" % min_len) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
82 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
83 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
84 if keep_negatives.lower() in ["true", "yes", "on"]: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
85 keep_negatives = True |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
86 elif keep_negatives.lower() in ["false", "no", "off"]: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
87 keep_negatives = False |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
88 else: |
4 | 89 sys.exit("Expected boolean for keep_negatives (e.g. true or false), not %r" % keep_negatives) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
90 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
91 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
92 if primer_type.lower() == "forward": |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
93 forward = True |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
94 rc = False |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
95 elif primer_type.lower() == "reverse": |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
96 forward = False |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
97 rc = False |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
98 elif primer_type.lower() == "reverse-complement": |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
99 forward = False |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
100 rc = True |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
101 else: |
4 | 102 sys.exit("Expected foward, reverse or reverse-complement not %r" % primer_type) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
103 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
104 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
105 ambiguous_dna_values = { |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
106 "A": "A", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
107 "C": "C", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
108 "G": "G", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
109 "T": "T", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
110 "M": "ACM", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
111 "R": "AGR", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
112 "W": "ATW", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
113 "S": "CGS", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
114 "Y": "CTY", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
115 "K": "GTK", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
116 "V": "ACGMRSV", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
117 "H": "ACTMWYH", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
118 "D": "AGTRWKD", |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
119 "B": "CGTSYKB", |
4 | 120 "X": ".", # faster than [GATCMRWSYKVVHDBXN] or even [GATC] |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
121 "N": ".", |
4 | 122 } |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
123 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
124 ambiguous_dna_re = {} |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
125 for letter, values in ambiguous_dna_values.iteritems(): |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
126 if len(values) == 1: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
127 ambiguous_dna_re[letter] = values |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
128 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
129 ambiguous_dna_re[letter] = "[%s]" % values |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
130 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
131 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
132 def make_reg_ex(seq): |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
133 return "".join(ambiguous_dna_re[letter] for letter in seq) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
134 |
4 | 135 |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
136 def make_reg_ex_mm(seq, mm): |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
137 if mm > 2: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
138 raise NotImplementedError("At most 2 mismatches allowed!") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
139 seq = seq.upper() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
140 yield make_reg_ex(seq) |
4 | 141 for i in range(1, mm + 1): |
142 # Missing first/last i bases at very start/end of sequence | |
143 for reg in make_reg_ex_mm(seq[i:], mm - i): | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
144 yield "^" + reg |
4 | 145 for reg in make_reg_ex_mm(seq[:-i], mm - i): |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
146 yield "$" + reg |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
147 if mm >= 1: |
4 | 148 for i, letter in enumerate(seq): |
149 # We'll use a set to remove any duplicate patterns | |
150 # if letter not in "NX": | |
151 pattern = seq[:i] + "N" + seq[i + 1:] | |
5 | 152 assert len(pattern) == len(seq), ("Len %s is %i, len %s is %i" |
153 % (pattern, len(pattern), seq, len(seq))) | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
154 yield make_reg_ex(pattern) |
4 | 155 if mm >= 2: |
156 for i, letter in enumerate(seq): | |
157 # We'll use a set to remove any duplicate patterns | |
158 # if letter not in "NX": | |
159 for k, letter in enumerate(seq[i + 1:]): | |
160 # We'll use a set to remove any duplicate patterns | |
161 # if letter not in "NX": | |
162 pattern = seq[:i] + "N" + seq[i + 1:i + 1 + k] + "N" + seq[i + k + 2:] | |
5 | 163 assert len(pattern) == len(seq), ("Len %s is %i, len %s is %i" |
164 % (pattern, len(pattern), seq, len(seq))) | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
165 yield make_reg_ex(pattern) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
166 |
4 | 167 |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
168 def load_primers_as_re(primer_fasta, mm, rc=False): |
4 | 169 # Read primer file and record all specified sequences |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
170 primers = set() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
171 in_handle = open(primer_fasta, "rU") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
172 reader = fastaReader(in_handle) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
173 count = 0 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
174 for record in reader: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
175 if rc: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
176 seq = reverse_complement(record.sequence) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
177 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
178 seq = record.sequence |
4 | 179 # primers.add(re.compile(make_reg_ex(seq))) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
180 count += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
181 for pattern in make_reg_ex_mm(seq, mm): |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
182 primers.add(pattern) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
183 in_handle.close() |
4 | 184 # Use set to avoid duplicates, sort to have longest first |
185 # (so more specific primers found before less specific ones) | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
186 primers = sorted(set(primers), key=lambda p: -len(p)) |
4 | 187 return count, re.compile("|".join(primers)) # make one monster re! |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
188 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
189 |
4 | 190 # Read primer file and record all specified sequences |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
191 count, primer = load_primers_as_re(primer_fasta, mm, rc) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
192 print "%i primer sequences" % count |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
193 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
194 short_neg = 0 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
195 short_clipped = 0 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
196 clipped = 0 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
197 negs = 0 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
198 |
4 | 199 if seq_format.lower() == "sff": |
200 # SFF is different because we just change the trim points | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
201 if forward: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
202 def process(records): |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
203 global short_clipped, short_neg, clipped, negs |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
204 for record in records: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
205 left_clip = record.annotations["clip_qual_left"] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
206 right_clip = record.annotations["clip_qual_right"] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
207 seq = str(record.seq)[left_clip:right_clip].upper() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
208 result = primer.search(seq) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
209 if result: |
4 | 210 # Forward primer, take everything after it |
211 # so move the left clip along | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
212 if len(seq) - result.end() >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
213 record.annotations["clip_qual_left"] = left_clip + result.end() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
214 clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
215 yield record |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
216 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
217 short_clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
218 elif keep_negatives: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
219 if len(seq) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
220 negs += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
221 yield record |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
222 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
223 short_neg += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
224 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
225 def process(records): |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
226 global short_clipped, short_neg, clipped, negs |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
227 for record in records: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
228 left_clip = record.annotations["clip_qual_left"] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
229 right_clip = record.annotations["clip_qual_right"] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
230 seq = str(record.seq)[left_clip:right_clip].upper() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
231 result = primer.search(seq) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
232 if result: |
4 | 233 # Reverse primer, take everything before it |
234 # so move the right clip back | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
235 new_len = result.start() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
236 if new_len >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
237 record.annotations["clip_qual_right"] = left_clip + new_len |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
238 clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
239 yield record |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
240 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
241 short_clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
242 elif keep_negatives: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
243 if len(seq) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
244 negs += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
245 yield record |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
246 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
247 short_neg += 1 |
4 | 248 |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
249 in_handle = open(in_file, "rb") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
250 try: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
251 manifest = ReadRocheXmlManifest(in_handle) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
252 except ValueError: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
253 manifest = None |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
254 in_handle.seek(0) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
255 out_handle = open(out_file, "wb") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
256 writer = SffWriter(out_handle, xml=manifest) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
257 writer.write_file(process(SffIterator(in_handle))) |
4 | 258 # End of SFF code |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
259 elif seq_format.lower().startswith("fastq"): |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
260 in_handle = open(in_file, "rU") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
261 out_handle = open(out_file, "w") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
262 reader = fastqReader(in_handle) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
263 writer = fastqWriter(out_handle) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
264 if forward: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
265 for record in reader: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
266 seq = record.sequence.upper() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
267 result = primer.search(seq) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
268 if result: |
4 | 269 # Forward primer, take everything after it |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
270 cut = result.end() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
271 record.sequence = seq[cut:] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
272 if len(record.sequence) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
273 record.quality = record.quality[cut:] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
274 clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
275 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
276 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
277 short_clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
278 elif keep_negatives: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
279 if len(record) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
280 negs += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
281 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
282 else: |
4 | 283 short_neg += 1 |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
284 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
285 for record in reader: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
286 seq = record.sequence.upper() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
287 result = primer.search(seq) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
288 if result: |
4 | 289 # Reverse primer, take everything before it |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
290 cut = result.start() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
291 record.sequence = seq[:cut] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
292 if len(record.sequence) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
293 record.quality = record.quality[:cut] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
294 clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
295 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
296 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
297 short_clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
298 elif keep_negatives: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
299 if len(record) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
300 negs += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
301 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
302 else: |
4 | 303 short_neg += 1 |
304 elif seq_format.lower() == "fasta": | |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
305 in_handle = open(in_file, "rU") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
306 out_handle = open(out_file, "w") |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
307 reader = fastaReader(in_handle) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
308 writer = fastaWriter(out_handle) |
4 | 309 # Following code is identical to that for FASTQ but without editing qualities |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
310 if forward: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
311 for record in reader: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
312 seq = record.sequence.upper() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
313 result = primer.search(seq) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
314 if result: |
4 | 315 # Forward primer, take everything after it |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
316 cut = result.end() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
317 record.sequence = seq[cut:] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
318 if len(record.sequence) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
319 clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
320 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
321 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
322 short_clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
323 elif keep_negatives: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
324 if len(record) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
325 negs += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
326 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
327 else: |
4 | 328 short_neg += 1 |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
329 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
330 for record in reader: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
331 seq = record.sequence.upper() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
332 result = primer.search(seq) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
333 if result: |
4 | 334 # Reverse primer, take everything before it |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
335 cut = result.start() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
336 record.sequence = seq[:cut] |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
337 if len(record.sequence) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
338 clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
339 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
340 else: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
341 short_clipped += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
342 elif keep_negatives: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
343 if len(record) >= min_len: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
344 negs += 1 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
345 writer.write(record) |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
346 else: |
4 | 347 short_neg += 1 |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
348 else: |
4 | 349 sys.exit("Unsupported file type %r" % seq_format) |
2
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
350 in_handle.close() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
351 out_handle.close() |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
352 |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
353 print "Kept %i clipped reads," % clipped |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
354 print "discarded %i short." % short_clipped |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
355 if keep_negatives: |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
356 print "Kept %i non-matching reads," % negs |
ee5acea162a7
Uploaded v0.0.10, README now using RST, MIT licence, automatic Biopython dependency
peterjc
parents:
diff
changeset
|
357 print "discarded %i short." % short_neg |