Mercurial > repos > peterjc > seq_filter_by_id
annotate tools/seq_filter_by_id/seq_filter_by_id.py @ 3:44ab4c0f7683 draft
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
author | peterjc |
---|---|
date | Fri, 11 Oct 2013 04:37:12 -0400 |
parents | |
children | 832c1fd57852 |
rev | line source |
---|---|
3
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
2 """Filter a FASTA, FASTQ or SSF file with IDs from a tabular file. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
3 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
4 Takes six command line options, tabular filename, ID column numbers (comma |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
5 separated list using one based counting), input filename, input type (e.g. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
6 FASTA or SFF) and two output filenames (for records with and without the |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
7 given IDs, same format as input sequence file). |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
8 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
9 If either output filename is just a minus sign, that file is not created. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
10 This is intended to allow output for just the matched (or just the non-matched) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
11 records. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
12 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
13 When filtering an SFF file, any Roche XML manifest in the input file is |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
14 preserved in both output files. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
15 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
16 Note in the default NCBI BLAST+ tabular output, the query sequence ID is |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
17 in column one, and the ID of the match from the database is in column two. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
18 Here sensible values for the column numbers would therefore be "1" or "2". |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
19 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
20 This tool is a short Python script which requires Biopython 1.54 or later |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
21 for SFF file support. If you use this tool in scientific work leading to a |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
22 publication, please cite the Biopython application note: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
23 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
24 Cock et al 2009. Biopython: freely available Python tools for computational |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
25 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
26 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
27 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
28 This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
29 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
30 See accompanying text file for licence details (MIT license). |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
31 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
32 This is version 0.1.0 of the script, use -v or --version to get the version. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
33 """ |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
34 import os |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
35 import sys |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
36 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
37 def stop_err(msg, err=1): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
38 sys.stderr.write(msg.rstrip() + "\n") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
39 sys.exit(err) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
40 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
41 if "-v" in sys.argv or "--version" in sys.argv: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
42 print "v0.1.0" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
43 sys.exit(0) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
44 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
45 #Parse Command Line |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
46 if len(sys.argv) - 1 < 7 or len(sys.argv) % 2 == 1: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
47 stop_err("Expected 7 or more arguments, 5 required " |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
48 "(in seq, seq format, out pos, out neg, logic) " |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
49 "then one or more pairs (tab file, columns), " |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
50 "got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv))) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
51 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
52 in_file, seq_format, out_positive_file, out_negative_file, logic = sys.argv[1:6] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
53 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
54 if not os.path.isfile(in_file): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
55 stop_err("Missing input file %r" % in_file) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
56 if out_positive_file == "-" and out_negative_file == "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
57 stop_err("Neither output file requested") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
58 if logic not in ["UNION", "INTERSECTION"]: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
59 stop_err("Fifth agrument should be 'UNION' or 'INTERSECTION', not %r" % logic) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
60 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
61 identifiers = [] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
62 for i in range((len(sys.argv) - 6) // 2): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
63 tabular_file = sys.argv[6+2*i] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
64 cols_arg = sys.argv[7+2*i] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
65 if not os.path.isfile(tabular_file): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
66 stop_err("Missing tabular identifier file %r" % tabular_file) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
67 try: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
68 columns = [int(arg)-1 for arg in cols_arg.split(",")] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
69 except ValueError: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
70 stop_err("Expected list of columns (comma separated integers), got %r" % cols_arg) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
71 if min(columns) < 0: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
72 stop_err("Expect one-based column numbers (not zero-based counting), got %r" % cols_arg) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
73 identifiers.append((tabular_file, columns)) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
74 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
75 #Read tabular file(s) and record all specified identifiers |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
76 ids = None #Will be a set |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
77 for tabular_file, columns in identifiers: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
78 file_ids = set() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
79 handle = open(tabular_file, "rU") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
80 if len(columns)>1: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
81 #General case of many columns |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
82 for line in handle: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
83 if line.startswith("#"): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
84 #Ignore comments |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
85 continue |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
86 parts = line.rstrip("\n").split("\t") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
87 for col in columns: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
88 file_ids.add(parts[col]) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
89 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
90 #Single column, special case speed up |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
91 col = columns[0] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
92 for line in handle: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
93 if not line.startswith("#"): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
94 file_ids.add(line.rstrip("\n").split("\t")[col]) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
95 print "Using %i IDs from column %s in tabular file" % (len(file_ids), ", ".join(str(col+1) for col in columns)) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
96 if ids is None: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
97 ids = file_ids |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
98 if logic == "UNION": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
99 ids.update(file_ids) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
100 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
101 ids.intersection_update(file_ids) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
102 handle.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
103 if len(identifiers) > 1: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
104 if logic == "UNION": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
105 print "Have %i IDs combined from %i tabular files" % (len(ids), len(identifiers)) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
106 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
107 print "Have %i IDs in common from %i tabular files" % (len(ids), len(identifiers)) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
108 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
109 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
110 def crude_fasta_iterator(handle): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
111 """Yields tuples, record ID and the full record as a string.""" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
112 while True: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
113 line = handle.readline() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
114 if line == "": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
115 return # Premature end of file, or just empty? |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
116 if line[0] == ">": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
117 break |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
118 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
119 no_id_warned = False |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
120 while True: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
121 if line[0] != ">": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
122 raise ValueError( |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
123 "Records in Fasta files should start with '>' character") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
124 try: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
125 id = line[1:].split(None, 1)[0] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
126 except IndexError: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
127 if not no_id_warned: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
128 sys.stderr.write("WARNING - Malformed FASTA entry with no identifier\n") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
129 no_id_warned = True |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
130 id = None |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
131 lines = [line] |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
132 line = handle.readline() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
133 while True: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
134 if not line: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
135 break |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
136 if line[0] == ">": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
137 break |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
138 lines.append(line) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
139 line = handle.readline() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
140 yield id, "".join(lines) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
141 if not line: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
142 return # StopIteration |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
143 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
144 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
145 def fasta_filter(in_file, pos_file, neg_file, wanted): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
146 """FASTA filter producing 60 character line wrapped outout.""" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
147 pos_count = neg_count = 0 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
148 #Galaxy now requires Python 2.5+ so can use with statements, |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
149 with open(in_file) as in_handle: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
150 #Doing the if statement outside the loop for speed |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
151 #(with the downside of three very similar loops). |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
152 if pos_file != "-" and neg_file != "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
153 print "Generating two FASTA files" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
154 with open(pos_file, "w") as pos_handle: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
155 with open(neg_file, "w") as neg_handle: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
156 for identifier, record in crude_fasta_iterator(in_handle): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
157 if identifier in wanted: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
158 pos_handle.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
159 pos_count += 1 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
160 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
161 neg_handle.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
162 neg_count += 1 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
163 elif pos_file != "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
164 print "Generating matching FASTA file" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
165 with open(pos_file, "w") as pos_handle: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
166 for identifier, record in crude_fasta_iterator(in_handle): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
167 if identifier in wanted: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
168 pos_handle.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
169 pos_count += 1 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
170 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
171 neg_count += 1 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
172 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
173 print "Generating non-matching FASTA file" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
174 assert neg_file != "-" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
175 with open(neg_file, "w") as neg_handle: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
176 for identifier, record in crude_fasta_iterator(in_handle): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
177 if identifier in wanted: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
178 pos_count += 1 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
179 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
180 neg_handle.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
181 neg_count += 1 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
182 return pos_count, neg_count |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
183 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
184 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
185 if seq_format.lower()=="sff": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
186 #Now write filtered SFF file based on IDs from BLAST file |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
187 try: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
188 from Bio.SeqIO.SffIO import SffIterator, SffWriter |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
189 except ImportError: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
190 stop_err("SFF filtering requires Biopython 1.54 or later") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
191 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
192 try: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
193 from Bio.SeqIO.SffIO import ReadRocheXmlManifest |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
194 except ImportError: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
195 #Prior to Biopython 1.56 this was a private function |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
196 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
197 in_handle = open(in_file, "rb") #must be binary mode! |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
198 try: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
199 manifest = ReadRocheXmlManifest(in_handle) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
200 except ValueError: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
201 manifest = None |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
202 #This makes two passes though the SFF file with isn't so efficient, |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
203 #but this makes the code simple. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
204 pos_count = neg_count = 0 |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
205 if out_positive_file != "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
206 out_handle = open(out_positive_file, "wb") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
207 writer = SffWriter(out_handle, xml=manifest) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
208 in_handle.seek(0) #start again after getting manifest |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
209 pos_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id in ids) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
210 out_handle.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
211 if out_negative_file != "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
212 out_handle = open(out_negative_file, "wb") |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
213 writer = SffWriter(out_handle, xml=manifest) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
214 in_handle.seek(0) #start again |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
215 neg_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id not in ids) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
216 out_handle.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
217 #And we're done |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
218 in_handle.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
219 #At the time of writing, Galaxy doesn't show SFF file read counts, |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
220 #so it is useful to put them in stdout and thus shown in job info. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
221 print "%i with and %i without specified IDs" % (pos_count, neg_count) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
222 elif seq_format.lower()=="fasta": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
223 #Write filtered FASTA file based on IDs from tabular file |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
224 pos_count, neg_count = fasta_filter(in_file, out_positive_file, out_negative_file, ids) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
225 print "%i with and %i without specified IDs" % (pos_count, neg_count) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
226 elif seq_format.lower().startswith("fastq"): |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
227 #Write filtered FASTQ file based on IDs from tabular file |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
228 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
229 reader = fastqReader(open(in_file, "rU")) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
230 if out_positive_file != "-" and out_negative_file != "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
231 print "Generating two FASTQ files" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
232 positive_writer = fastqWriter(open(out_positive_file, "w")) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
233 negative_writer = fastqWriter(open(out_negative_file, "w")) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
234 for record in reader: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
235 #The [1:] is because the fastaReader leaves the > on the identifier. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
236 if record.identifier and record.identifier.split()[0][1:] in ids: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
237 positive_writer.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
238 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
239 negative_writer.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
240 positive_writer.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
241 negative_writer.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
242 elif out_positive_file != "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
243 print "Generating matching FASTQ file" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
244 positive_writer = fastqWriter(open(out_positive_file, "w")) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
245 for record in reader: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
246 #The [1:] is because the fastaReader leaves the > on the identifier. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
247 if record.identifier and record.identifier.split()[0][1:] in ids: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
248 positive_writer.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
249 positive_writer.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
250 elif out_negative_file != "-": |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
251 print "Generating non-matching FASTQ file" |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
252 negative_writer = fastqWriter(open(out_negative_file, "w")) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
253 for record in reader: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
254 #The [1:] is because the fastaReader leaves the > on the identifier. |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
255 if not record.identifier or record.identifier.split()[0][1:] not in ids: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
256 negative_writer.write(record) |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
257 negative_writer.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
258 reader.close() |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
259 else: |
44ab4c0f7683
Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff
changeset
|
260 stop_err("Unsupported file type %r" % seq_format) |