annotate tools/seq_filter_by_id/seq_filter_by_id.py @ 5:832c1fd57852 draft

v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
author peterjc
date Wed, 13 May 2015 11:03:57 -0400
parents 44ab4c0f7683
children 03e134cae41a
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
2 """Filter a FASTA, FASTQ or SSF file with IDs from a tabular file.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
4 Takes six command line options, tabular filename, ID column numbers (comma
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
5 separated list using one based counting), input filename, input type (e.g.
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
6 FASTA or SFF) and up to two output filenames (for records with and without
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
7 the given IDs, same format as input sequence file).
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
8
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
9 When filtering an SFF file, any Roche XML manifest in the input file is
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
10 preserved in both output files.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
11
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
12 Note in the default NCBI BLAST+ tabular output, the query sequence ID is
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
13 in column one, and the ID of the match from the database is in column two.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
14 Here sensible values for the column numbers would therefore be "1" or "2".
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
15
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
16 This tool is a short Python script which requires Biopython 1.54 or later.
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
17 If you use this tool in scientific work leading to a publication, please
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
18 cite the Biopython application note:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
19
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
20 Cock et al 2009. Biopython: freely available Python tools for computational
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
21 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
22 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
23
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
24 This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
25 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
26 See accompanying text file for licence details (MIT license).
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
27
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
28 Use -v or --version to get the version, -h or --help for help.
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
29 """
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
30 import os
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
31 import sys
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
32 import re
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
33 from optparse import OptionParser
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
34
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
35 def sys_exit(msg, err=1):
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
36 sys.stderr.write(msg.rstrip() + "\n")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
37 sys.exit(err)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
38
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
39 #Parse Command Line
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
40 usage = """Use as follows:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
41
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
42 $ python seq_filter_by_id.py [options] tab1 cols1 [, tab2 cols2, ...]
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
43
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
44 e.g. Positive matches using column one from tabular file:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
45
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
46 $ seq_filter_by_id.py -i my_seqs.fastq -f fastq -p matches.fastq ids.tabular 1
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
47
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
48 Multiple tabular files and column numbers may be given, or replaced with
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
49 the -t or --text option.
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
50 """
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
51 parser = OptionParser(usage=usage)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
52 parser.add_option('-i', '--input', dest='input',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
53 default=None, help='Input sequences filename',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
54 metavar="FILE")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
55 parser.add_option('-f', '--format', dest='format',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
56 default=None,
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
57 help='Input sequence format (e.g. fasta, fastq, sff)')
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
58 parser.add_option('-t', '--text', dest='id_list',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
59 default=None, help="Lists of white space separated IDs (instead of a tabular file)")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
60 parser.add_option('-p', '--positive', dest='output_positive',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
61 default=None,
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
62 help='Output filename for matches',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
63 metavar="FILE")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
64 parser.add_option('-n', '--negative', dest='output_negative',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
65 default=None,
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
66 help='Output filename for non-matches',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
67 metavar="FILE")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
68 parser.add_option("-l", "--logic", dest="logic",
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
69 default="UNION",
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
70 help="How to combined multiple ID columns (UNION or INTERSECTION)")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
71 parser.add_option("-s", "--suffix", dest="suffix",
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
72 action="store_true",
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
73 help="Ignore pair-read suffices for matching names")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
74 parser.add_option("-v", "--version", dest="version",
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
75 default=False, action="store_true",
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
76 help="Show version and quit")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
77
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
78 options, args = parser.parse_args()
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
79
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
80 if options.version:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
81 print "v0.2.1"
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
82 sys.exit(0)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
83
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
84 in_file = options.input
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
85 seq_format = options.format
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
86 out_positive_file = options.output_positive
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
87 out_negative_file = options.output_negative
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
88 logic = options.logic
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
89 drop_suffices = bool(options.suffix)
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
90
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
91 if in_file is None or not os.path.isfile(in_file):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
92 sys_exit("Missing input file: %r" % in_file)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
93 if out_positive_file is None and out_negative_file is None:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
94 sys_exit("Neither output file requested")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
95 if seq_format is None:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
96 sys_exit("Missing sequence format")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
97 if logic not in ["UNION", "INTERSECTION"]:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
98 sys_exit("Logic agrument should be 'UNION' or 'INTERSECTION', not %r" % logic)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
99 if options.id_list and args:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
100 sys_exit("Cannot accepted IDs via both -t and as tabular files")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
101 elif not options.id_list and not args:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
102 sys_exit("Expected matched pairs of tabular files and columns (or -t given)")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
103 if len(args) % 2:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
104 sys_exit("Expected matched pairs of tabular files and columns, not: %r" % args)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
105
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
106
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
107 #Cope with three widely used suffix naming convensions,
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
108 #Illumina: /1 or /2
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
109 #Forward/revered: .f or .r
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
110 #Sanger, e.g. .p1k and .q1k
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
111 #See http://staden.sourceforge.net/manual/pregap4_unix_50.html
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
112 #re_f = re.compile(r"(/1|\.f|\.[sfp]\d\w*)$")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
113 #re_r = re.compile(r"(/2|\.r|\.[rq]\d\w*)$")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
114 re_suffix = re.compile(r"(/1|\.f|\.[sfp]\d\w*|/2|\.r|\.[rq]\d\w*)$")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
115 assert re_suffix.search("demo.f")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
116 assert re_suffix.search("demo.s1")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
117 assert re_suffix.search("demo.f1k")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
118 assert re_suffix.search("demo.p1")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
119 assert re_suffix.search("demo.p1k")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
120 assert re_suffix.search("demo.p1lk")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
121 assert re_suffix.search("demo/2")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
122 assert re_suffix.search("demo.r")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
123 assert re_suffix.search("demo.q1")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
124 assert re_suffix.search("demo.q1lk")
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
125
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
126 identifiers = []
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
127 for i in range(len(args) // 2):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
128 tabular_file = args[2*i]
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
129 cols_arg = args[2*i + 1]
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
130 if not os.path.isfile(tabular_file):
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
131 sys_exit("Missing tabular identifier file %r" % tabular_file)
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
132 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
133 columns = [int(arg)-1 for arg in cols_arg.split(",")]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
134 except ValueError:
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
135 sys_exit("Expected list of columns (comma separated integers), got %r" % cols_arg)
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
136 if min(columns) < 0:
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
137 sys_exit("Expect one-based column numbers (not zero-based counting), got %r" % cols_arg)
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
138 identifiers.append((tabular_file, columns))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
139
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
140 name_warn = False
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
141 def check_white_space(name):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
142 parts = name.split(None, 1)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
143 global name_warn
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
144 if not name_warn and len(parts) > 1:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
145 name_warn = "WARNING: Some of your identifiers had white space in them, " + \
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
146 "using first word only. e.g.:\n%s\n" % name
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
147 return parts[0]
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
148
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
149 if drop_suffices:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
150 def clean_name(name):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
151 """Remove suffix."""
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
152 name = check_white_space(name)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
153 match = re_suffix.search(name)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
154 if match:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
155 # Use the fact this is a suffix, and regular expression will be
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
156 # anchored to the end of the name:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
157 return name[:match.start()]
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
158 else:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
159 # Nothing to do
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
160 return name
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
161 assert clean_name("foo/1") == "foo"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
162 assert clean_name("foo/2") == "foo"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
163 assert clean_name("bar.f") == "bar"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
164 assert clean_name("bar.r") == "bar"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
165 assert clean_name("baz.p1") == "baz"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
166 assert clean_name("baz.q2") == "baz"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
167 else:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
168 # Just check the white space
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
169 clean_name = check_white_space
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
170
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
171
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
172 mapped_chars = { '>' :'__gt__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
173 '<' :'__lt__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
174 "'" :'__sq__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
175 '"' :'__dq__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
176 '[' :'__ob__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
177 ']' :'__cb__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
178 '{' :'__oc__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
179 '}' :'__cc__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
180 '@' : '__at__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
181 '\n' : '__cn__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
182 '\r' : '__cr__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
183 '\t' : '__tc__',
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
184 '#' : '__pd__'
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
185 }
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
186
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
187 #Read tabular file(s) and record all specified identifiers
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
188 ids = None #Will be a set
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
189 if options.id_list:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
190 assert not identifiers
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
191 ids = set()
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
192 id_list = options.id_list
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
193 #Galaxy turns \r into __cr__ (CR) etc
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
194 for k in mapped_chars:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
195 id_list = id_list.replace(mapped_chars[k], k)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
196 for x in options.id_list.split():
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
197 ids.add(clean_name(x.strip()))
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
198 print("Have %i unique identifiers from list" % len(ids))
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
199 for tabular_file, columns in identifiers:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
200 file_ids = set()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
201 handle = open(tabular_file, "rU")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
202 if len(columns)>1:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
203 #General case of many columns
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
204 for line in handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
205 if line.startswith("#"):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
206 #Ignore comments
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
207 continue
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
208 parts = line.rstrip("\n").split("\t")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
209 for col in columns:
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
210 file_ids.add(clean_name(parts[col]))
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
211 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
212 #Single column, special case speed up
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
213 col = columns[0]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
214 for line in handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
215 if not line.startswith("#"):
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
216 file_ids.add(clean_name(line.rstrip("\n").split("\t")[col]))
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
217 print "Using %i IDs from column %s in tabular file" % (len(file_ids), ", ".join(str(col+1) for col in columns))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
218 if ids is None:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
219 ids = file_ids
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
220 if logic == "UNION":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
221 ids.update(file_ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
222 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
223 ids.intersection_update(file_ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
224 handle.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
225 if len(identifiers) > 1:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
226 if logic == "UNION":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
227 print "Have %i IDs combined from %i tabular files" % (len(ids), len(identifiers))
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
228 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
229 print "Have %i IDs in common from %i tabular files" % (len(ids), len(identifiers))
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
230 if name_warn:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
231 sys.stderr.write(name_warn)
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
232
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
233 def crude_fasta_iterator(handle):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
234 """Yields tuples, record ID and the full record as a string."""
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
235 while True:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
236 line = handle.readline()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
237 if line == "":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
238 return # Premature end of file, or just empty?
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
239 if line[0] == ">":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
240 break
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
241
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
242 no_id_warned = False
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
243 while True:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
244 if line[0] != ">":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
245 raise ValueError(
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
246 "Records in Fasta files should start with '>' character")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
247 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
248 id = line[1:].split(None, 1)[0]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
249 except IndexError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
250 if not no_id_warned:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
251 sys.stderr.write("WARNING - Malformed FASTA entry with no identifier\n")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
252 no_id_warned = True
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
253 id = None
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
254 lines = [line]
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
255 line = handle.readline()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
256 while True:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
257 if not line:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
258 break
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
259 if line[0] == ">":
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
260 break
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
261 lines.append(line)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
262 line = handle.readline()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
263 yield id, "".join(lines)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
264 if not line:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
265 return # StopIteration
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
266
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
267
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
268 def fasta_filter(in_file, pos_file, neg_file, wanted):
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
269 """FASTA filter producing 60 character line wrapped outout."""
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
270 pos_count = neg_count = 0
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
271 #Galaxy now requires Python 2.5+ so can use with statements,
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
272 with open(in_file) as in_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
273 #Doing the if statement outside the loop for speed
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
274 #(with the downside of three very similar loops).
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
275 if pos_file is not None and neg_file is not None:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
276 print "Generating two FASTA files"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
277 with open(pos_file, "w") as pos_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
278 with open(neg_file, "w") as neg_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
279 for identifier, record in crude_fasta_iterator(in_handle):
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
280 if clean_name(identifier) in wanted:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
281 pos_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
282 pos_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
283 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
284 neg_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
285 neg_count += 1
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
286 elif pos_file is not None:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
287 print "Generating matching FASTA file"
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
288 with open(pos_file, "w") as pos_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
289 for identifier, record in crude_fasta_iterator(in_handle):
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
290 if clean_name(identifier) in wanted:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
291 pos_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
292 pos_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
293 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
294 neg_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
295 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
296 print "Generating non-matching FASTA file"
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
297 assert neg_file is not None
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
298 with open(neg_file, "w") as neg_handle:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
299 for identifier, record in crude_fasta_iterator(in_handle):
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
300 if clean_name(identifier) in wanted:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
301 pos_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
302 else:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
303 neg_handle.write(record)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
304 neg_count += 1
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
305 return pos_count, neg_count
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
306
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
307
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
308 def fastq_filter(in_file, pos_file, neg_file, wanted):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
309 """FASTQ filter."""
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
310 from Bio.SeqIO.QualityIO import FastqGeneralIterator
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
311 handle = open(in_file, "r")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
312 if out_positive_file is not None and out_negative_file is not None:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
313 print "Generating two FASTQ files"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
314 positive_handle = open(out_positive_file, "w")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
315 negative_handle = open(out_negative_file, "w")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
316 print in_file
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
317 for title, seq, qual in FastqGeneralIterator(handle):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
318 print("%s --> %s" % (title, clean_name(title.split(None, 1)[0])))
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
319 if clean_name(title.split(None, 1)[0]) in ids:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
320 positive_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
321 else:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
322 negative_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
323 positive_handle.close()
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
324 negative_handle.close()
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
325 elif out_positive_file is not None:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
326 print "Generating matching FASTQ file"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
327 positive_handle = open(out_positive_file, "w")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
328 for title, seq, qual in FastqGeneralIterator(handle):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
329 if clean_name(title.split(None, 1)[0]) in ids:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
330 positive_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
331 positive_handle.close()
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
332 elif out_negative_file is not None:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
333 print "Generating non-matching FASTQ file"
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
334 negative_handle = open(out_negative_file, "w")
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
335 for title, seq, qual in FastqGeneralIterator(handle):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
336 if clean_name(title.split(None, 1)[0]) not in ids:
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
337 negative_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
338 negative_handle.close()
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
339 handle.close()
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
340 # This does not currently bother to record record counts (faster)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
341
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
342
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
343 def sff_filter(in_file, pos_file, neg_file, wanted):
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
344 """SFF filter."""
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
345 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
346 from Bio.SeqIO.SffIO import SffIterator, SffWriter
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
347 except ImportError:
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
348 sys_exit("SFF filtering requires Biopython 1.54 or later")
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
349
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
350 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
351 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
352 except ImportError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
353 #Prior to Biopython 1.56 this was a private function
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
354 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
355
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
356 in_handle = open(in_file, "rb") #must be binary mode!
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
357 try:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
358 manifest = ReadRocheXmlManifest(in_handle)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
359 except ValueError:
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
360 manifest = None
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
361
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
362 #This makes two passes though the SFF file with isn't so efficient,
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
363 #but this makes the code simple.
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
364 pos_count = neg_count = 0
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
365 if out_positive_file is not None:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
366 out_handle = open(out_positive_file, "wb")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
367 writer = SffWriter(out_handle, xml=manifest)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
368 in_handle.seek(0) #start again after getting manifest
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
369 pos_count = writer.write_file(rec for rec in SffIterator(in_handle) if clean_name(rec.id) in ids)
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
370 out_handle.close()
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
371 if out_negative_file is not None:
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
372 out_handle = open(out_negative_file, "wb")
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
373 writer = SffWriter(out_handle, xml=manifest)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
374 in_handle.seek(0) #start again
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
375 neg_count = writer.write_file(rec for rec in SffIterator(in_handle) if clean_name(rec.id) not in ids)
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
376 out_handle.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
377 #And we're done
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
378 in_handle.close()
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
379 #At the time of writing, Galaxy doesn't show SFF file read counts,
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
380 #so it is useful to put them in stdout and thus shown in job info.
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
381 return pos_count, neg_count
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
382
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
383
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
384 if seq_format.lower()=="sff":
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
385 # Now write filtered SFF file based on IDs wanted
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
386 pos_count, neg_count = sff_filter(in_file, out_positive_file, out_negative_file, ids)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
387 # At the time of writing, Galaxy doesn't show SFF file read counts,
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
388 # so it is useful to put them in stdout and thus shown in job info.
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
389 elif seq_format.lower()=="fasta":
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
390 # Write filtered FASTA file based on IDs from tabular file
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
391 pos_count, neg_count = fasta_filter(in_file, out_positive_file, out_negative_file, ids)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
392 print "%i with and %i without specified IDs" % (pos_count, neg_count)
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
393 elif seq_format.lower().startswith("fastq"):
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
394 # Write filtered FASTQ file based on IDs from tabular file
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
395 fastq_filter(in_file, out_positive_file, out_negative_file, ids)
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
396 # This does not currently track the counts
3
44ab4c0f7683 Uploaded v0.0.6, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence
peterjc
parents:
diff changeset
397 else:
5
832c1fd57852 v0.2.2; New options for IDs via text parameter, ignore paired read suffix; misc changes
peterjc
parents: 3
diff changeset
398 sys_exit("Unsupported file type %r" % seq_format)