annotate tools/filters/seq_filter_by_id.py @ 0:5844f6a450ed

Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 17:24:30 -0400
parents
children 262f08104540
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Filter a FASTA, FASTQ or SSF file with IDs from a tabular file.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 Takes six command line options, tabular filename, ID column numbers (comma
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 separated list using one based counting), input filename, input type (e.g.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 FASTA or SFF) and two output filenames (for records with and without the
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 given IDs, same format as input sequence file).
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9 If either output filename is just a minus sign, that file is not created.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 This is intended to allow output for just the matched (or just the non-matched)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11 records.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 When filtering an SFF file, any Roche XML manifest in the input file is
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 preserved in both output files.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16 Note in the default NCBI BLAST+ tabular output, the query sequence ID is
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 in column one, and the ID of the match from the database is in column two.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 Here sensible values for the column numbers would therefore be "1" or "2".
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 This tool is a short Python script which requires Biopython 1.54 or later
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 for SFF file support. If you use this tool in scientific work leading to a
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 publication, please cite the Biopython application note:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 Cock et al 2009. Biopython: freely available Python tools for computational
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 This script is copyright 2010 by Peter Cock, SCRI, UK. All rights reserved.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 See accompanying text file for licence details (MIT/BSD style).
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 This is version 0.0.1 of the script.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 """
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 import sys
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 def stop_err(msg, err=1):
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 sys.stderr.write(msg.rstrip() + "\n")
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 sys.exit(err)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 #Parse Command Line
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 try:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 tabular_file, cols_arg, in_file, seq_format, out_positive_file, out_negative_file = sys.argv[1:]
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 except ValueError:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 stop_err("Expected six arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44 try:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 columns = [int(arg)-1 for arg in cols_arg.split(",")]
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 except ValueError:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 stop_err("Expected list of columns (comma separated integers), got %s" % cols_arg)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 if out_positive_file == "-" and out_negative_file == "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 stop_err("Neither output file requested")
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 #Read tabular file and record all specified identifiers
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 ids = set()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 handle = open(tabular_file, "rU")
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 if len(columns)>1:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 #General case of many columns
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 for line in handle:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 if line.startswith("#"):
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60 #Ignore comments
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 continue
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62 parts = line.rstrip("\n").split("\t")
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 for col in columns:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 ids.add(parts[col])
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 print "Using %i IDs from %i columns of tabular file" % (len(ids), len(columns))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 else:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 #Single column, special case speed up
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 col = columns[0]
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 for line in handle:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 if not line.startswith("#"):
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 ids.add(line.rstrip("\n").split("\t")[col])
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 print "Using %i IDs from tabular file" % (len(ids))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73 handle.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 if seq_format.lower()=="sff":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 #Now write filtered SFF file based on IDs from BLAST file
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 try:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 from Bio.SeqIO.SffIO import SffIterator, SffWriter
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 except ImportError:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 stop_err("Requires Biopython 1.54 or later")
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 try:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 except ImportError:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 #Prior to Biopython 1.56 this was a private function
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 in_handle = open(in_file, "rb") #must be binary mode!
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89 try:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 manifest = ReadRocheXmlManifest(in_handle)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 except ValueError:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 manifest = None
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 #This makes two passes though the SFF file with isn't so efficient,
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 #but this makes the code simple.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 if out_positive_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 out_handle = open(out_positive_file, "wb")
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 writer = SffWriter(out_handle, xml=manifest)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 in_handle.seek(0) #start again after getting manifest
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 pos_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id in ids)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 out_handle.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 if out_negative_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 out_handle = open(out_negative_file, "wb")
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 writer = SffWriter(out_handle, xml=manifest)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 in_handle.seek(0) #start again
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 neg_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id not in ids)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 out_handle.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 #And we're done
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 in_handle.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 #At the time of writing, Galaxy doesn't show SFF file read counts,
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 #so it is useful to put them in stdout and thus shown in job info.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 if out_positive_file != "-" and out_negative_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 print "%i with and %i without specified IDs" % (pos_count, neg_count)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 elif out_positive_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 print "%i with specified IDs" % pos_count
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 elif out_negative_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 print "%i without specified IDs" % neg_count
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 elif seq_format.lower()=="fasta":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 #Write filtered FASTA file based on IDs from tabular file
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 reader = fastaReader(open(in_file, "rU"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 if out_positive_file != "-" and out_negative_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 print "Generating two FASTA files"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 positive_writer = fastaWriter(open(out_positive_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 negative_writer = fastaWriter(open(out_negative_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 for record in reader:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126 #The [1:] is because the fastaReader leaves the > on the identifer.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 if record.identifier and record.identifier.split()[0][1:] in ids:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 positive_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 else:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
130 negative_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
131 positive_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
132 negative_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 elif out_positive_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
134 print "Generating matching FASTA file"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
135 positive_writer = fastaWriter(open(out_positive_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
136 for record in reader:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
137 #The [1:] is because the fastaReader leaves the > on the identifer.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
138 if record.identifier and record.identifier.split()[0][1:] in ids:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
139 positive_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
140 positive_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
141 elif out_negative_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
142 print "Generating non-matching FASTA file"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
143 negative_writer = fastaWriter(open(out_negative_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
144 for record in reader:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
145 #The [1:] is because the fastaReader leaves the > on the identifer.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
146 if not record.identifier or record.identifier.split()[0][1:] not in ids:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
147 negative_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
148 negative_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
149 elif seq_format.lower().startswith("fastq"):
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
150 #Write filtered FASTQ file based on IDs from tabular file
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
151 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
152 reader = fastqReader(open(in_file, "rU"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
153 if out_positive_file != "-" and out_negative_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
154 print "Generating two FASTQ files"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
155 positive_writer = fastqWriter(open(out_positive_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
156 negative_writer = fastqWriter(open(out_negative_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
157 for record in reader:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
158 #The [1:] is because the fastaReader leaves the @ on the identifer.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
159 if record.identifier and record.identifier.split()[0][1:] in ids:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
160 positive_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
161 else:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
162 negative_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
163 positive_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
164 negative_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
165 elif out_positive_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
166 print "Generating matching FASTQ file"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
167 positive_writer = fastqWriter(open(out_positive_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
168 for record in reader:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
169 #The [1:] is because the fastaReader leaves the @ on the identifer.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
170 if record.identifier and record.identifier.split()[0][1:] in ids:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
171 positive_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
172 positive_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
173 elif out_negative_file != "-":
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
174 print "Generating non-matching FASTQ file"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
175 negative_writer = fastqWriter(open(out_negative_file, "w"))
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
176 for record in reader:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
177 #The [1:] is because the fastaReader leaves the @ on the identifer.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
178 if not record.identifier or record.identifier.split()[0][1:] not in ids:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
179 negative_writer.write(record)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
180 negative_writer.close()
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
181 else:
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
182 stop_err("Unsupported file type %r" % seq_format)