comparison tools/fasta_tools/fasta_filter_by_id.py @ 1:5cd569750e85

Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 17:22:48 -0400
parents 2e5f8ad1a096
children 5b552b3005f2
comparison
equal deleted inserted replaced
0:2e5f8ad1a096 1:5cd569750e85
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 """Filter a FASTA file with IDs from a tabular file, e.g. from BLAST. 2 """Filter a FASTA file with IDs from a tabular file, e.g. from BLAST.
3 3
4 Takes five command line options, tabular filename, ID column numbers 4 Takes five command line options, tabular filename, ID column numbers
5 (comma separated list using one based counting), input FASTA filename, and 5 (comma separated list using one based counting), input FASTA filename, and
6 two output FASTA filenames (for records with and without any BLAST hits). 6 two output FASTA filenames (for records with and without the given IDs).
7 If the either output filename is just a minus sign, that file is not created. 7
8 If either output filename is just a minus sign, that file is not created.
8 This is intended to allow output for just the matched (or just the non-matched) 9 This is intended to allow output for just the matched (or just the non-matched)
9 records. 10 records.
10 11
11 Note in the default NCBI BLAST+ tabular output, the query sequence ID is 12 Note in the default NCBI BLAST+ tabular output, the query sequence ID is
12 in column one, and the ID of the match from the database is in column two. 13 in column one, and the ID of the match from the database is in column two.
49 if not line.startswith("#"): 50 if not line.startswith("#"):
50 ids.add(line.rstrip("\n").split("\t")[col]) 51 ids.add(line.rstrip("\n").split("\t")[col])
51 print "Using %i IDs from tabular file" % (len(ids)) 52 print "Using %i IDs from tabular file" % (len(ids))
52 handle.close() 53 handle.close()
53 54
54 #Write filtered FASTA file based on IDs from BLAST file 55 #Write filtered FASTA file based on IDs from tabular file
55 reader = fastaReader(open(in_file, "rU")) 56 reader = fastaReader(open(in_file, "rU"))
56 if out_positive_file != "-" and out_negative_file != "-": 57 if out_positive_file != "-" and out_negative_file != "-":
57 print "Generating two FASTA files" 58 print "Generating two FASTA files"
58 positive_writer = fastaWriter(open(out_positive_file, "w")) 59 positive_writer = fastaWriter(open(out_positive_file, "w"))
59 negative_writer = fastaWriter(open(out_negative_file, "w")) 60 negative_writer = fastaWriter(open(out_negative_file, "w"))