Mercurial > repos > iuc > khmer_filter_below_abundance_cutoff
annotate filter-below-abund.py @ 5:d9cc87dfec6e draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 19ea4feff5ccf3744c549b9a67259947a1cb90ba
author | iuc |
---|---|
date | Sat, 21 Jan 2017 14:43:29 -0500 |
parents | fadfb386c876 |
children | bf716cfc3225 |
rev | line source |
---|---|
0
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
1 #! /usr/bin/env python |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
2 # This file is part of khmer, https://github.com/dib-lab/khmer/, and is |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
3 # Copyright (C) 2011-2015, Michigan State University. |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
4 # Copyright (C) 2015, The Regents of the University of California. |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
5 # |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
6 # Redistribution and use in source and binary forms, with or without |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
7 # modification, are permitted provided that the following conditions are |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
8 # met: |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
9 # |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
10 # * Redistributions of source code must retain the above copyright |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
11 # notice, this list of conditions and the following disclaimer. |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
12 # |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
13 # * Redistributions in binary form must reproduce the above |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
14 # copyright notice, this list of conditions and the following |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
15 # disclaimer in the documentation and/or other materials provided |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
16 # with the distribution. |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
17 # |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
18 # * Neither the name of the Michigan State University nor the names |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
19 # of its contributors may be used to endorse or promote products |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
20 # derived from this software without specific prior written |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
21 # permission. |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
22 # |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
23 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
24 # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
25 # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
26 # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
27 # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
28 # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
29 # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
30 # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
31 # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
32 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
33 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
34 # |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
35 # Contact: khmer-project@idyll.org |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
36 from __future__ import print_function |
5
d9cc87dfec6e
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 19ea4feff5ccf3744c549b9a67259947a1cb90ba
iuc
parents:
4
diff
changeset
|
37 |
d9cc87dfec6e
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 19ea4feff5ccf3744c549b9a67259947a1cb90ba
iuc
parents:
4
diff
changeset
|
38 import os |
0
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
39 import sys |
5
d9cc87dfec6e
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 19ea4feff5ccf3744c549b9a67259947a1cb90ba
iuc
parents:
4
diff
changeset
|
40 |
0
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
41 import khmer |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
42 from khmer.thread_utils import ThreadedSequenceProcessor, verbose_fasta_iter |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
43 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
44 WORKER_THREADS = 8 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
45 GROUPSIZE = 100 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
46 CUTOFF = 50 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
47 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
48 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
49 def main(): |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
50 counting_ht = sys.argv[1] |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
51 infiles = sys.argv[2:] |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
52 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
53 print('file with ht: %s' % counting_ht) |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
54 print('-- settings:') |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
55 print('N THREADS', WORKER_THREADS) |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
56 print('--') |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
57 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
58 print('making hashtable') |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
59 ht = khmer.load_countgraph(counting_ht) |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
60 K = ht.ksize() |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
61 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
62 for infile in infiles: |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
63 print('filtering', infile) |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
64 outfile = os.path.basename(infile) + '.below' |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
65 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
66 outfp = open(outfile, 'w') |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
67 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
68 def process_fn(record, ht=ht): |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
69 name = record['name'] |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
70 seq = record['sequence'] |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
71 if 'N' in seq: |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
72 return None, None |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
73 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
74 trim_seq, trim_at = ht.trim_below_abundance(seq, CUTOFF) |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
75 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
76 if trim_at >= K: |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
77 return name, trim_seq |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
78 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
79 return None, None |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
80 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
81 tsp = ThreadedSequenceProcessor(process_fn, WORKER_THREADS, GROUPSIZE) |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
82 |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
83 tsp.start(verbose_fasta_iter(infile), outfp) |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
84 |
4
fadfb386c876
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit f30bc6f7e10409dfa0dd65688e60da8b59f12464
iuc
parents:
0
diff
changeset
|
85 |
0
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
86 if __name__ == '__main__': |
67b597ec0baf
planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff
changeset
|
87 main() |