Mercurial > repos > peterjc > fastq_paired_unpaired
annotate tools/fastq/fastq_paired_unpaired.py @ 0:72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
author | peterjc |
---|---|
date | Tue, 07 Jun 2011 17:21:17 -0400 |
parents | |
children | 7ed81e36fc1c |
rev | line source |
---|---|
0
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 """Divides a FASTQ into paired and single (orphan reads) as separate files. |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
4 The input file should be a valid FASTQ file which has been sorted so that |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
5 any partner forward+reverse reads are consecutive. The output files all |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
6 preserve this sort order. Pairing are recognised based on standard name |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
7 suffices. See below or run the tool with no arguments for more details. |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
9 Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
10 Color Space should all work equally well). |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
11 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
12 This script is copyright 2010 by Peter Cock, SCRI, UK. All rights reserved. |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
13 See accompanying text file for licence details (MIT/BSD style). |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
14 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
15 This is version 0.0.4 of the script. |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 """ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
17 import os |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
18 import sys |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
19 import re |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 def stop_err(msg, err=1): |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 sys.stderr.write(msg.rstrip() + "\n") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
24 sys.exit(err) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 msg = """Expect either 3 or 4 arguments, all FASTQ filenames. |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 If you want two output files, use four arguments: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 - FASTQ variant (e.g. sanger, solexa, illumina or cssanger) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 - Sorted input FASTQ filename, |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 - Output paired FASTQ filename (forward then reverse interleaved), |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 - Output singles FASTQ filename (orphan reads) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 If you want three output files, use five arguments: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
35 - FASTQ variant (e.g. sanger, solexa, illumina or cssanger) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
36 - Sorted input FASTQ filename, |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
37 - Output forward paired FASTQ filename, |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
38 - Output reverse paired FASTQ filename, |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
39 - Output singles FASTQ filename (orphan reads) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
40 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
41 The input file should be a valid FASTQ file which has been sorted so that |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
42 any partner forward+reverse reads are consecutive. The output files all |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 preserve this sort order. |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
44 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
45 Any reads where the forward/reverse naming suffix used is not recognised |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
46 are treated as orphan reads. The tool supports the /1 and /2 convention |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 used by Illumina, the .f and .r convention, and the Sanger convention |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
48 (see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details). |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
49 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
50 Note that this does support multiple forward and reverse reads per template |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
51 (which is quite common with Sanger sequencing), e.g. this which is sorted |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
52 alphabetically: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
53 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
54 WTSI_1055_4p17.p1kapIBF |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
55 WTSI_1055_4p17.p1kpIBF |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
56 WTSI_1055_4p17.q1kapIBR |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
57 WTSI_1055_4p17.q1kpIBR |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
58 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
59 or this where the reads already come in pairs: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
60 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
61 WTSI_1055_4p17.p1kapIBF |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
62 WTSI_1055_4p17.q1kapIBR |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
63 WTSI_1055_4p17.p1kpIBF |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
64 WTSI_1055_4p17.q1kpIBR |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
65 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 both become: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
67 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
68 WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
69 WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
70 """ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
71 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
72 if len(sys.argv) == 5: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
73 format, input_fastq, pairs_fastq, singles_fastq = sys.argv[1:] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
74 elif len(sys.argv) == 6: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
75 pairs_fastq = None |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
76 format, input_fastq, pairs_f_fastq, pairs_r_fastq, singles_fastq = sys.argv[1:] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
77 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
78 stop_err(msg) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
79 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
80 format = format.replace("fastq", "").lower() |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
81 if not format: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
82 format="sanger" #safe default |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
83 elif format not in ["sanger","solexa","illumina","cssanger"]: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
84 stop_err("Unrecognised format %s" % format) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
85 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
86 def f_match(name): |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
87 if name.endswith("/1") or name.endswith(".f"): |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
88 return True |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
89 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
90 #Cope with three widely used suffix naming convensions, |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
91 #Illumina: /1 or /2 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
92 #Forward/revered: .f or .r |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
93 #Sanger, e.g. .p1k and .q1k |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
94 #See http://staden.sourceforge.net/manual/pregap4_unix_50.html |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
95 re_f = re.compile(r"(/1|\.f|\.[sfp]\d\w*)$") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
96 re_r = re.compile(r"(/2|\.r|\.[rq]\d\w*)$") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
97 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
98 #assert re_f.match("demo/1") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
99 assert re_f.search("demo.f") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
100 assert re_f.search("demo.s1") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
101 assert re_f.search("demo.f1k") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
102 assert re_f.search("demo.p1") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
103 assert re_f.search("demo.p1k") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
104 assert re_f.search("demo.p1lk") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
105 assert re_r.search("demo/2") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
106 assert re_r.search("demo.r") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
107 assert re_r.search("demo.q1") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
108 assert re_r.search("demo.q1lk") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
109 assert not re_r.search("demo/1") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
110 assert not re_r.search("demo.f") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
111 assert not re_r.search("demo.p") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
112 assert not re_f.search("demo/2") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
113 assert not re_f.search("demo.r") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
114 assert not re_f.search("demo.q") |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
115 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
116 count, forward, reverse, neither, pairs, singles = 0, 0, 0, 0, 0, 0 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
117 in_handle = open(input_fastq) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
118 if pairs_fastq: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
119 pairs_f_writer = fastqWriter(open(pairs_fastq, "w"), format) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
120 pairs_r_writer = pairs_f_writer |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
121 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 pairs_f_writer = fastqWriter(open(pairs_f_fastq, "w"), format) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
123 pairs_r_writer = fastqWriter(open(pairs_r_fastq, "w"), format) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
124 singles_writer = fastqWriter(open(singles_fastq, "w"), format) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
125 last_template, buffered_reads = None, [] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
126 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
127 for record in fastqReader(in_handle, format): |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
128 count += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
129 name = record.identifier.split(None,1)[0] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
130 assert name[0]=="@", record.identifier #Quirk of the Galaxy parser |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
131 suffix = re_f.search(name) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
132 if suffix: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
133 #============ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
134 #Forward read |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
135 #============ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
136 template = name[:suffix.start()] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
137 #print name, "forward", template |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
138 forward += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
139 if last_template == template: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
140 buffered_reads.append(record) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
141 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
142 #Any old buffered reads are orphans |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
143 for old in buffered_reads: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
144 singles_writer.write(old) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
145 singles += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
146 #Save this read in buffer |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
147 buffered_reads = [record] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
148 last_template = template |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
149 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
150 suffix = re_r.search(name) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
151 if suffix: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
152 #============ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
153 #Reverse read |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
154 #============ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
155 template = name[:suffix.start()] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
156 #print name, "reverse", template |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
157 reverse += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
158 if last_template == template and buffered_reads: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
159 #We have a pair! |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
160 #If there are multiple buffered forward reads, want to pick |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
161 #the first one (although we could try and do something more |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
162 #clever looking at the suffix to match them up...) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
163 old = buffered_reads.pop(0) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
164 pairs_f_writer.write(old) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
165 pairs_r_writer.write(record) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
166 pairs += 2 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
167 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
168 #As this is a reverse read, this and any buffered read(s) are |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
169 #all orphans |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
170 for old in buffered_reads: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
171 singles_writer.write(old) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
172 singles += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
173 buffered_reads = [] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
174 singles_writer.write(record) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
175 singles += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
176 last_template = None |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
177 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
178 #=========================== |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
179 #Neither forward nor reverse |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
180 #=========================== |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
181 singles_writer.write(record) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
182 singles += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
183 neither += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
184 for old in buffered_reads: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
185 singles_writer.write(old) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
186 singles += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
187 buffered_reads = [] |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
188 last_template = None |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
189 if last_template: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
190 #Left over singles... |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
191 for old in buffered_reads: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
192 singles_writer.write(old) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
193 singles += 1 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
194 in_handle.close |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
195 singles_writer.close() |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
196 if pairs_fastq: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
197 pairs_f_writer.close() |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
198 assert pairs_r_writer.file.closed |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
199 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
200 pairs_f_writer.close() |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
201 pairs_r_writer.close() |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
202 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
203 if neither: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
204 print "%i reads (%i forward, %i reverse, %i neither), %i in pairs, %i as singles" \ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
205 % (count, forward, reverse, neither, pairs, singles) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
206 else: |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
207 print "%i reads (%i forward, %i reverse), %i in pairs, %i as singles" \ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
208 % (count, forward, reverse, pairs, singles) |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
209 |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
210 assert count == pairs + singles == forward + reverse + neither, \ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
211 "%i vs %i+%i=%i vs %i+%i=%i" \ |
72e9fcaec61f
Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
212 % (count,pairs,singles,pairs+singles,forward,reverse,neither,forward+reverse+neither) |