annotate tools/seq_filter_by_id/seq_filter_by_id.py @ 3:44ab4c0f7683 draft

Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
author peterjc
date Fri, 11 Oct 2013 04:37:12 -0400
parents
children 832c1fd57852
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
2 """Filter a FASTA, FASTQ or SSF file with IDs from a tabular file.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
4 Takes six command line options, tabular filename, ID column numbers (comma
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
5 separated list using one based counting), input filename, input type (e.g.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
6 FASTA or SFF) and two output filenames (for records with and without the
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
7 given IDs, same format as input sequence file).
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
8
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
9 If either output filename is just a minus sign, that file is not created.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
10 This is intended to allow output for just the matched (or just the non-matched)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
11 records.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
12
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
13 When filtering an SFF file, any Roche XML manifest in the input file is
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
14 preserved in both output files.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
15
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
16 Note in the default NCBI BLAST+ tabular output, the query sequence ID is
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
17 in column one, and the ID of the match from the database is in column two.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
18 Here sensible values for the column numbers would therefore be "1" or "2".
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
19
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
20 This tool is a short Python script which requires Biopython 1.54 or later
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
21 for SFF file support. If you use this tool in scientific work leading to a
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
22 publication, please cite the Biopython application note:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
23
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
24 Cock et al 2009. Biopython: freely available Python tools for computational
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
25 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
26 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
27
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
28 This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
29 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
30 See accompanying text file for licence details (MIT license).
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
31
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
32 This is version 0.1.0 of the script, use -v or --version to get the version.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
33 """
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
34 import os
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
35 import sys
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
36
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
37 def stop_err(msg, err=1):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
38 sys.stderr.write(msg.rstrip() + "\n")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
39 sys.exit(err)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
40
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
41 if "-v" in sys.argv or "--version" in sys.argv:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
42 print "v0.1.0"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
43 sys.exit(0)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
44
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
45 #Parse Command Line
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
46 if len(sys.argv) - 1 < 7 or len(sys.argv) % 2 == 1:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
47 stop_err("Expected 7 or more arguments, 5 required "
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
48 "(in seq, seq format, out pos, out neg, logic) "
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
49 "then one or more pairs (tab file, columns), "
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
50 "got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
51
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
52 in_file, seq_format, out_positive_file, out_negative_file, logic = sys.argv[1:6]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
53
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
54 if not os.path.isfile(in_file):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
55 stop_err("Missing input file %r" % in_file)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
56 if out_positive_file == "-" and out_negative_file == "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
57 stop_err("Neither output file requested")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
58 if logic not in ["UNION", "INTERSECTION"]:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
59 stop_err("Fifth agrument should be 'UNION' or 'INTERSECTION', not %r" % logic)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
60
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
61 identifiers = []
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
62 for i in range((len(sys.argv) - 6) // 2):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
63 tabular_file = sys.argv[6+2*i]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
64 cols_arg = sys.argv[7+2*i]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
65 if not os.path.isfile(tabular_file):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
66 stop_err("Missing tabular identifier file %r" % tabular_file)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
67 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
68 columns = [int(arg)-1 for arg in cols_arg.split(",")]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
69 except ValueError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
70 stop_err("Expected list of columns (comma separated integers), got %r" % cols_arg)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
71 if min(columns) < 0:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
72 stop_err("Expect one-based column numbers (not zero-based counting), got %r" % cols_arg)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
73 identifiers.append((tabular_file, columns))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
74
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
75 #Read tabular file(s) and record all specified identifiers
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
76 ids = None #Will be a set
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
77 for tabular_file, columns in identifiers:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
78 file_ids = set()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
79 handle = open(tabular_file, "rU")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
80 if len(columns)>1:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
81 #General case of many columns
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
82 for line in handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
83 if line.startswith("#"):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
84 #Ignore comments
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
85 continue
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
86 parts = line.rstrip("\n").split("\t")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
87 for col in columns:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
88 file_ids.add(parts[col])
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
89 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
90 #Single column, special case speed up
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
91 col = columns[0]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
92 for line in handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
93 if not line.startswith("#"):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
94 file_ids.add(line.rstrip("\n").split("\t")[col])
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
95 print "Using %i IDs from column %s in tabular file" % (len(file_ids), ", ".join(str(col+1) for col in columns))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
96 if ids is None:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
97 ids = file_ids
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
98 if logic == "UNION":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
99 ids.update(file_ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
100 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
101 ids.intersection_update(file_ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
102 handle.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
103 if len(identifiers) > 1:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
104 if logic == "UNION":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
105 print "Have %i IDs combined from %i tabular files" % (len(ids), len(identifiers))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
106 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
107 print "Have %i IDs in common from %i tabular files" % (len(ids), len(identifiers))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
108
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
109
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
110 def crude_fasta_iterator(handle):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
111 """Yields tuples, record ID and the full record as a string."""
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
112 while True:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
113 line = handle.readline()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
114 if line == "":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
115 return # Premature end of file, or just empty?
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
116 if line[0] == ">":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
117 break
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
118
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
119 no_id_warned = False
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
120 while True:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
121 if line[0] != ">":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
122 raise ValueError(
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
123 "Records in Fasta files should start with '>' character")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
124 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
125 id = line[1:].split(None, 1)[0]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
126 except IndexError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
127 if not no_id_warned:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
128 sys.stderr.write("WARNING - Malformed FASTA entry with no identifier\n")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
129 no_id_warned = True
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
130 id = None
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
131 lines = [line]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
132 line = handle.readline()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
133 while True:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
134 if not line:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
135 break
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
136 if line[0] == ">":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
137 break
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
138 lines.append(line)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
139 line = handle.readline()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
140 yield id, "".join(lines)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
141 if not line:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
142 return # StopIteration
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
143
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
144
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
145 def fasta_filter(in_file, pos_file, neg_file, wanted):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
146 """FASTA filter producing 60 character line wrapped outout."""
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
147 pos_count = neg_count = 0
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
148 #Galaxy now requires Python 2.5+ so can use with statements,
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
149 with open(in_file) as in_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
150 #Doing the if statement outside the loop for speed
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
151 #(with the downside of three very similar loops).
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
152 if pos_file != "-" and neg_file != "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
153 print "Generating two FASTA files"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
154 with open(pos_file, "w") as pos_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
155 with open(neg_file, "w") as neg_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
156 for identifier, record in crude_fasta_iterator(in_handle):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
157 if identifier in wanted:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
158 pos_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
159 pos_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
160 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
161 neg_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
162 neg_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
163 elif pos_file != "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
164 print "Generating matching FASTA file"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
165 with open(pos_file, "w") as pos_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
166 for identifier, record in crude_fasta_iterator(in_handle):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
167 if identifier in wanted:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
168 pos_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
169 pos_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
170 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
171 neg_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
172 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
173 print "Generating non-matching FASTA file"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
174 assert neg_file != "-"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
175 with open(neg_file, "w") as neg_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
176 for identifier, record in crude_fasta_iterator(in_handle):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
177 if identifier in wanted:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
178 pos_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
179 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
180 neg_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
181 neg_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
182 return pos_count, neg_count
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
183
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
184
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
185 if seq_format.lower()=="sff":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
186 #Now write filtered SFF file based on IDs from BLAST file
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
187 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
188 from Bio.SeqIO.SffIO import SffIterator, SffWriter
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
189 except ImportError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
190 stop_err("SFF filtering requires Biopython 1.54 or later")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
191
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
192 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
193 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
194 except ImportError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
195 #Prior to Biopython 1.56 this was a private function
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
196 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
197 in_handle = open(in_file, "rb") #must be binary mode!
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
198 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
199 manifest = ReadRocheXmlManifest(in_handle)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
200 except ValueError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
201 manifest = None
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
202 #This makes two passes though the SFF file with isn't so efficient,
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
203 #but this makes the code simple.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
204 pos_count = neg_count = 0
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
205 if out_positive_file != "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
206 out_handle = open(out_positive_file, "wb")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
207 writer = SffWriter(out_handle, xml=manifest)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
208 in_handle.seek(0) #start again after getting manifest
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
209 pos_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id in ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
210 out_handle.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
211 if out_negative_file != "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
212 out_handle = open(out_negative_file, "wb")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
213 writer = SffWriter(out_handle, xml=manifest)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
214 in_handle.seek(0) #start again
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
215 neg_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id not in ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
216 out_handle.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
217 #And we're done
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
218 in_handle.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
219 #At the time of writing, Galaxy doesn't show SFF file read counts,
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
220 #so it is useful to put them in stdout and thus shown in job info.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
221 print "%i with and %i without specified IDs" % (pos_count, neg_count)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
222 elif seq_format.lower()=="fasta":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
223 #Write filtered FASTA file based on IDs from tabular file
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
224 pos_count, neg_count = fasta_filter(in_file, out_positive_file, out_negative_file, ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
225 print "%i with and %i without specified IDs" % (pos_count, neg_count)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
226 elif seq_format.lower().startswith("fastq"):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
227 #Write filtered FASTQ file based on IDs from tabular file
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
228 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
229 reader = fastqReader(open(in_file, "rU"))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
230 if out_positive_file != "-" and out_negative_file != "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
231 print "Generating two FASTQ files"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
232 positive_writer = fastqWriter(open(out_positive_file, "w"))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
233 negative_writer = fastqWriter(open(out_negative_file, "w"))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
234 for record in reader:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
235 #The [1:] is because the fastaReader leaves the > on the identifier.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
236 if record.identifier and record.identifier.split()[0][1:] in ids:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
237 positive_writer.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
238 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
239 negative_writer.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
240 positive_writer.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
241 negative_writer.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
242 elif out_positive_file != "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
243 print "Generating matching FASTQ file"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
244 positive_writer = fastqWriter(open(out_positive_file, "w"))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
245 for record in reader:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
246 #The [1:] is because the fastaReader leaves the > on the identifier.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
247 if record.identifier and record.identifier.split()[0][1:] in ids:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
248 positive_writer.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
249 positive_writer.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
250 elif out_negative_file != "-":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
251 print "Generating non-matching FASTQ file"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
252 negative_writer = fastqWriter(open(out_negative_file, "w"))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
253 for record in reader:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
254 #The [1:] is because the fastaReader leaves the > on the identifier.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
255 if not record.identifier or record.identifier.split()[0][1:] not in ids:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
256 negative_writer.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
257 negative_writer.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
258 reader.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
259 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
260 stop_err("Unsupported file type %r" % seq_format)