annotate tools/fastq/fastq_paired_unpaired.py @ 0:72e9fcaec61f

Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 17:21:17 -0400
parents
children 7ed81e36fc1c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Divides a FASTQ into paired and single (orphan reads) as separate files.
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 The input file should be a valid FASTQ file which has been sorted so that
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 any partner forward+reverse reads are consecutive. The output files all
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 preserve this sort order. Pairing are recognised based on standard name
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 suffices. See below or run the tool with no arguments for more details.
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9 Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 Color Space should all work equally well).
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 This script is copyright 2010 by Peter Cock, SCRI, UK. All rights reserved.
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 See accompanying text file for licence details (MIT/BSD style).
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 This is version 0.0.4 of the script.
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16 """
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 import os
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 import sys
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 import re
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 def stop_err(msg, err=1):
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 sys.stderr.write(msg.rstrip() + "\n")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 sys.exit(err)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 msg = """Expect either 3 or 4 arguments, all FASTQ filenames.
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 If you want two output files, use four arguments:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 - Sorted input FASTQ filename,
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 - Output paired FASTQ filename (forward then reverse interleaved),
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 - Output singles FASTQ filename (orphan reads)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 If you want three output files, use five arguments:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 - Sorted input FASTQ filename,
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 - Output forward paired FASTQ filename,
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38 - Output reverse paired FASTQ filename,
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 - Output singles FASTQ filename (orphan reads)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 The input file should be a valid FASTQ file which has been sorted so that
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 any partner forward+reverse reads are consecutive. The output files all
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 preserve this sort order.
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 Any reads where the forward/reverse naming suffix used is not recognised
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 are treated as orphan reads. The tool supports the /1 and /2 convention
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 used by Illumina, the .f and .r convention, and the Sanger convention
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 (see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details).
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 Note that this does support multiple forward and reverse reads per template
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 (which is quite common with Sanger sequencing), e.g. this which is sorted
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 alphabetically:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 WTSI_1055_4p17.p1kapIBF
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 WTSI_1055_4p17.p1kpIBF
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 WTSI_1055_4p17.q1kapIBR
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 WTSI_1055_4p17.q1kpIBR
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 or this where the reads already come in pairs:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 WTSI_1055_4p17.p1kapIBF
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62 WTSI_1055_4p17.q1kapIBR
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 WTSI_1055_4p17.p1kpIBF
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 WTSI_1055_4p17.q1kpIBR
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 both become:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 """
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 if len(sys.argv) == 5:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73 format, input_fastq, pairs_fastq, singles_fastq = sys.argv[1:]
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 elif len(sys.argv) == 6:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 pairs_fastq = None
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 format, input_fastq, pairs_f_fastq, pairs_r_fastq, singles_fastq = sys.argv[1:]
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 stop_err(msg)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 format = format.replace("fastq", "").lower()
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 if not format:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82 format="sanger" #safe default
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 elif format not in ["sanger","solexa","illumina","cssanger"]:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 stop_err("Unrecognised format %s" % format)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 def f_match(name):
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 if name.endswith("/1") or name.endswith(".f"):
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 return True
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 #Cope with three widely used suffix naming convensions,
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 #Illumina: /1 or /2
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 #Forward/revered: .f or .r
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 #Sanger, e.g. .p1k and .q1k
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 #See http://staden.sourceforge.net/manual/pregap4_unix_50.html
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 re_f = re.compile(r"(/1|\.f|\.[sfp]\d\w*)$")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 re_r = re.compile(r"(/2|\.r|\.[rq]\d\w*)$")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 #assert re_f.match("demo/1")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 assert re_f.search("demo.f")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 assert re_f.search("demo.s1")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 assert re_f.search("demo.f1k")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 assert re_f.search("demo.p1")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 assert re_f.search("demo.p1k")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 assert re_f.search("demo.p1lk")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 assert re_r.search("demo/2")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 assert re_r.search("demo.r")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 assert re_r.search("demo.q1")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 assert re_r.search("demo.q1lk")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 assert not re_r.search("demo/1")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 assert not re_r.search("demo.f")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 assert not re_r.search("demo.p")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 assert not re_f.search("demo/2")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 assert not re_f.search("demo.r")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 assert not re_f.search("demo.q")
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 count, forward, reverse, neither, pairs, singles = 0, 0, 0, 0, 0, 0
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 in_handle = open(input_fastq)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 if pairs_fastq:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 pairs_f_writer = fastqWriter(open(pairs_fastq, "w"), format)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 pairs_r_writer = pairs_f_writer
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 pairs_f_writer = fastqWriter(open(pairs_f_fastq, "w"), format)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 pairs_r_writer = fastqWriter(open(pairs_r_fastq, "w"), format)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 singles_writer = fastqWriter(open(singles_fastq, "w"), format)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 last_template, buffered_reads = None, []
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 for record in fastqReader(in_handle, format):
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 count += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 name = record.identifier.split(None,1)[0]
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
130 assert name[0]=="@", record.identifier #Quirk of the Galaxy parser
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
131 suffix = re_f.search(name)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
132 if suffix:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 #============
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
134 #Forward read
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
135 #============
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
136 template = name[:suffix.start()]
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
137 #print name, "forward", template
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
138 forward += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
139 if last_template == template:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
140 buffered_reads.append(record)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
141 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
142 #Any old buffered reads are orphans
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
143 for old in buffered_reads:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
144 singles_writer.write(old)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
145 singles += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
146 #Save this read in buffer
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
147 buffered_reads = [record]
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
148 last_template = template
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
149 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
150 suffix = re_r.search(name)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
151 if suffix:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
152 #============
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
153 #Reverse read
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
154 #============
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
155 template = name[:suffix.start()]
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
156 #print name, "reverse", template
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
157 reverse += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
158 if last_template == template and buffered_reads:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
159 #We have a pair!
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
160 #If there are multiple buffered forward reads, want to pick
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
161 #the first one (although we could try and do something more
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
162 #clever looking at the suffix to match them up...)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
163 old = buffered_reads.pop(0)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
164 pairs_f_writer.write(old)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
165 pairs_r_writer.write(record)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
166 pairs += 2
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
167 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
168 #As this is a reverse read, this and any buffered read(s) are
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
169 #all orphans
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
170 for old in buffered_reads:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
171 singles_writer.write(old)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
172 singles += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
173 buffered_reads = []
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
174 singles_writer.write(record)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
175 singles += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
176 last_template = None
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
177 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
178 #===========================
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
179 #Neither forward nor reverse
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
180 #===========================
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
181 singles_writer.write(record)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
182 singles += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
183 neither += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
184 for old in buffered_reads:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
185 singles_writer.write(old)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
186 singles += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
187 buffered_reads = []
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
188 last_template = None
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
189 if last_template:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
190 #Left over singles...
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
191 for old in buffered_reads:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
192 singles_writer.write(old)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
193 singles += 1
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
194 in_handle.close
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
195 singles_writer.close()
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
196 if pairs_fastq:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
197 pairs_f_writer.close()
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
198 assert pairs_r_writer.file.closed
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
199 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
200 pairs_f_writer.close()
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
201 pairs_r_writer.close()
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
202
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
203 if neither:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
204 print "%i reads (%i forward, %i reverse, %i neither), %i in pairs, %i as singles" \
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
205 % (count, forward, reverse, neither, pairs, singles)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
206 else:
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
207 print "%i reads (%i forward, %i reverse), %i in pairs, %i as singles" \
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
208 % (count, forward, reverse, pairs, singles)
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
209
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
210 assert count == pairs + singles == forward + reverse + neither, \
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
211 "%i vs %i+%i=%i vs %i+%i=%i" \
72e9fcaec61f Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
212 % (count,pairs,singles,pairs+singles,forward,reverse,neither,forward+reverse+neither)