Mercurial > repos > peterjc > fasta_filter_by_id
comparison tools/fasta_tools/fasta_filter_by_id.py @ 1:5cd569750e85
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
| author | peterjc |
|---|---|
| date | Tue, 07 Jun 2011 17:22:48 -0400 |
| parents | 2e5f8ad1a096 |
| children | 5b552b3005f2 |
comparison
equal
deleted
inserted
replaced
| 0:2e5f8ad1a096 | 1:5cd569750e85 |
|---|---|
| 1 #!/usr/bin/env python | 1 #!/usr/bin/env python |
| 2 """Filter a FASTA file with IDs from a tabular file, e.g. from BLAST. | 2 """Filter a FASTA file with IDs from a tabular file, e.g. from BLAST. |
| 3 | 3 |
| 4 Takes five command line options, tabular filename, ID column numbers | 4 Takes five command line options, tabular filename, ID column numbers |
| 5 (comma separated list using one based counting), input FASTA filename, and | 5 (comma separated list using one based counting), input FASTA filename, and |
| 6 two output FASTA filenames (for records with and without any BLAST hits). | 6 two output FASTA filenames (for records with and without the given IDs). |
| 7 If the either output filename is just a minus sign, that file is not created. | 7 |
| 8 If either output filename is just a minus sign, that file is not created. | |
| 8 This is intended to allow output for just the matched (or just the non-matched) | 9 This is intended to allow output for just the matched (or just the non-matched) |
| 9 records. | 10 records. |
| 10 | 11 |
| 11 Note in the default NCBI BLAST+ tabular output, the query sequence ID is | 12 Note in the default NCBI BLAST+ tabular output, the query sequence ID is |
| 12 in column one, and the ID of the match from the database is in column two. | 13 in column one, and the ID of the match from the database is in column two. |
| 49 if not line.startswith("#"): | 50 if not line.startswith("#"): |
| 50 ids.add(line.rstrip("\n").split("\t")[col]) | 51 ids.add(line.rstrip("\n").split("\t")[col]) |
| 51 print "Using %i IDs from tabular file" % (len(ids)) | 52 print "Using %i IDs from tabular file" % (len(ids)) |
| 52 handle.close() | 53 handle.close() |
| 53 | 54 |
| 54 #Write filtered FASTA file based on IDs from BLAST file | 55 #Write filtered FASTA file based on IDs from tabular file |
| 55 reader = fastaReader(open(in_file, "rU")) | 56 reader = fastaReader(open(in_file, "rU")) |
| 56 if out_positive_file != "-" and out_negative_file != "-": | 57 if out_positive_file != "-" and out_negative_file != "-": |
| 57 print "Generating two FASTA files" | 58 print "Generating two FASTA files" |
| 58 positive_writer = fastaWriter(open(out_positive_file, "w")) | 59 positive_writer = fastaWriter(open(out_positive_file, "w")) |
| 59 negative_writer = fastaWriter(open(out_negative_file, "w")) | 60 negative_writer = fastaWriter(open(out_negative_file, "w")) |
