Mercurial > repos > peterjc > seq_filter_by_id
annotate tools/filters/seq_filter_by_id.py @ 0:5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
author | peterjc |
---|---|
date | Tue, 07 Jun 2011 17:24:30 -0400 |
parents | |
children | 262f08104540 |
rev | line source |
---|---|
0
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 """Filter a FASTA, FASTQ or SSF file with IDs from a tabular file. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
4 Takes six command line options, tabular filename, ID column numbers (comma |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
5 separated list using one based counting), input filename, input type (e.g. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
6 FASTA or SFF) and two output filenames (for records with and without the |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
7 given IDs, same format as input sequence file). |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
9 If either output filename is just a minus sign, that file is not created. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
10 This is intended to allow output for just the matched (or just the non-matched) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
11 records. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
12 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
13 When filtering an SFF file, any Roche XML manifest in the input file is |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
14 preserved in both output files. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
15 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 Note in the default NCBI BLAST+ tabular output, the query sequence ID is |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
17 in column one, and the ID of the match from the database is in column two. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
18 Here sensible values for the column numbers would therefore be "1" or "2". |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
19 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 This tool is a short Python script which requires Biopython 1.54 or later |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 for SFF file support. If you use this tool in scientific work leading to a |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 publication, please cite the Biopython application note: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
24 Cock et al 2009. Biopython: freely available Python tools for computational |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 This script is copyright 2010 by Peter Cock, SCRI, UK. All rights reserved. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 See accompanying text file for licence details (MIT/BSD style). |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 This is version 0.0.1 of the script. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 """ |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 import sys |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
35 def stop_err(msg, err=1): |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
36 sys.stderr.write(msg.rstrip() + "\n") |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
37 sys.exit(err) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
38 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
39 #Parse Command Line |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
40 try: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
41 tabular_file, cols_arg, in_file, seq_format, out_positive_file, out_negative_file = sys.argv[1:] |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
42 except ValueError: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 stop_err("Expected six arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv))) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
44 try: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
45 columns = [int(arg)-1 for arg in cols_arg.split(",")] |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
46 except ValueError: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 stop_err("Expected list of columns (comma separated integers), got %s" % cols_arg) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
48 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
49 if out_positive_file == "-" and out_negative_file == "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
50 stop_err("Neither output file requested") |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
51 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
52 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
53 #Read tabular file and record all specified identifiers |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
54 ids = set() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
55 handle = open(tabular_file, "rU") |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
56 if len(columns)>1: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
57 #General case of many columns |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
58 for line in handle: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
59 if line.startswith("#"): |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
60 #Ignore comments |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
61 continue |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
62 parts = line.rstrip("\n").split("\t") |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
63 for col in columns: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
64 ids.add(parts[col]) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
65 print "Using %i IDs from %i columns of tabular file" % (len(ids), len(columns)) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 else: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
67 #Single column, special case speed up |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
68 col = columns[0] |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
69 for line in handle: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
70 if not line.startswith("#"): |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
71 ids.add(line.rstrip("\n").split("\t")[col]) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
72 print "Using %i IDs from tabular file" % (len(ids)) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
73 handle.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
74 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
75 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
76 if seq_format.lower()=="sff": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
77 #Now write filtered SFF file based on IDs from BLAST file |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
78 try: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
79 from Bio.SeqIO.SffIO import SffIterator, SffWriter |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
80 except ImportError: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
81 stop_err("Requires Biopython 1.54 or later") |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
82 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
83 try: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
84 from Bio.SeqIO.SffIO import ReadRocheXmlManifest |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
85 except ImportError: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
86 #Prior to Biopython 1.56 this was a private function |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
87 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
88 in_handle = open(in_file, "rb") #must be binary mode! |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
89 try: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
90 manifest = ReadRocheXmlManifest(in_handle) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
91 except ValueError: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
92 manifest = None |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
93 #This makes two passes though the SFF file with isn't so efficient, |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
94 #but this makes the code simple. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
95 if out_positive_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
96 out_handle = open(out_positive_file, "wb") |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
97 writer = SffWriter(out_handle, xml=manifest) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
98 in_handle.seek(0) #start again after getting manifest |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
99 pos_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id in ids) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
100 out_handle.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
101 if out_negative_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
102 out_handle = open(out_negative_file, "wb") |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
103 writer = SffWriter(out_handle, xml=manifest) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
104 in_handle.seek(0) #start again |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
105 neg_count = writer.write_file(rec for rec in SffIterator(in_handle) if rec.id not in ids) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
106 out_handle.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
107 #And we're done |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
108 in_handle.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
109 #At the time of writing, Galaxy doesn't show SFF file read counts, |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
110 #so it is useful to put them in stdout and thus shown in job info. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
111 if out_positive_file != "-" and out_negative_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
112 print "%i with and %i without specified IDs" % (pos_count, neg_count) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
113 elif out_positive_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
114 print "%i with specified IDs" % pos_count |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
115 elif out_negative_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
116 print "%i without specified IDs" % neg_count |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
117 elif seq_format.lower()=="fasta": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
118 #Write filtered FASTA file based on IDs from tabular file |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
119 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
120 reader = fastaReader(open(in_file, "rU")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
121 if out_positive_file != "-" and out_negative_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 print "Generating two FASTA files" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
123 positive_writer = fastaWriter(open(out_positive_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
124 negative_writer = fastaWriter(open(out_negative_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
125 for record in reader: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
126 #The [1:] is because the fastaReader leaves the > on the identifer. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
127 if record.identifier and record.identifier.split()[0][1:] in ids: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
128 positive_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
129 else: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
130 negative_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
131 positive_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
132 negative_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
133 elif out_positive_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
134 print "Generating matching FASTA file" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
135 positive_writer = fastaWriter(open(out_positive_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
136 for record in reader: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
137 #The [1:] is because the fastaReader leaves the > on the identifer. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
138 if record.identifier and record.identifier.split()[0][1:] in ids: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
139 positive_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
140 positive_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
141 elif out_negative_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
142 print "Generating non-matching FASTA file" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
143 negative_writer = fastaWriter(open(out_negative_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
144 for record in reader: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
145 #The [1:] is because the fastaReader leaves the > on the identifer. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
146 if not record.identifier or record.identifier.split()[0][1:] not in ids: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
147 negative_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
148 negative_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
149 elif seq_format.lower().startswith("fastq"): |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
150 #Write filtered FASTQ file based on IDs from tabular file |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
151 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
152 reader = fastqReader(open(in_file, "rU")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
153 if out_positive_file != "-" and out_negative_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
154 print "Generating two FASTQ files" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
155 positive_writer = fastqWriter(open(out_positive_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
156 negative_writer = fastqWriter(open(out_negative_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
157 for record in reader: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
158 #The [1:] is because the fastaReader leaves the @ on the identifer. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
159 if record.identifier and record.identifier.split()[0][1:] in ids: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
160 positive_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
161 else: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
162 negative_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
163 positive_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
164 negative_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
165 elif out_positive_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
166 print "Generating matching FASTQ file" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
167 positive_writer = fastqWriter(open(out_positive_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
168 for record in reader: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
169 #The [1:] is because the fastaReader leaves the @ on the identifer. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
170 if record.identifier and record.identifier.split()[0][1:] in ids: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
171 positive_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
172 positive_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
173 elif out_negative_file != "-": |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
174 print "Generating non-matching FASTQ file" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
175 negative_writer = fastqWriter(open(out_negative_file, "w")) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
176 for record in reader: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
177 #The [1:] is because the fastaReader leaves the @ on the identifer. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
178 if not record.identifier or record.identifier.split()[0][1:] not in ids: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
179 negative_writer.write(record) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
180 negative_writer.close() |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
181 else: |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
182 stop_err("Unsupported file type %r" % seq_format) |