Mercurial > repos > peterjc > fastq_filter_by_id

--- a/tools/fastq/fastq_filter_by_id.py	Tue Jun 07 17:23:49 2011 -0400
+++ b/tools/fastq/fastq_filter_by_id.py	Tue Jun 07 17:24:08 2011 -0400
@@ -1,6 +1,9 @@
 #!/usr/bin/env python
 """Filter a FASTQ file with IDs from a tabular file, e.g. from BLAST.

+NOTE - This script is now OBSOLETE, having been replaced by a new verion
+which handles FASTA, FASTQ and SFF all in one.
+
 Takes five command line options, tabular filename, ID column numbers
 (comma separated list using one based counting), input FASTA filename, and
 two output FASTA filenames (for records with and without the given IDs).
@@ -13,10 +16,10 @@
 in column one, and the ID of the match from the database is in column two.
 Here sensible values for the column numbers would therefore be "1" or "2".

-This script is copyright 2010 by Peter Cock, SCRI, UK. All rights reserved.
+This script is copyright 2010-2011 by Peter Cock, SCRI, UK. All rights reserved.
 See accompanying text file for licence details (MIT/BSD style).

-This is version 0.0.2 of the script.
+This is version 0.0.4 of the script.
 """
 import sys
 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
@@ -86,7 +89,6 @@
         #The [1:] is because the fastaReader leaves the @ on the identifer.
         if not record.identifier or record.identifier.split()[0][1:] not in ids:
             negative_writer.write(record)
-    positive_writer.close()
     negative_writer.close()
 else:
     stop_err("Neither output file requested")
--- a/tools/fastq/fastq_filter_by_id.txt	Tue Jun 07 17:23:49 2011 -0400
+++ b/tools/fastq/fastq_filter_by_id.txt	Tue Jun 07 17:24:08 2011 -0400
@@ -1,3 +1,11 @@
+Obsolete
+========
+
+This tool is now obsolete, having been replaced	by a more general version
+covering the FASTA, FASTQ and SFF sequence formats in a single tool. You
+should only install this tool if you need to support existing workflows
+which used it.
+
 Galaxy tool to filter FASTQ sequences by ID
 ===========================================

@@ -33,13 +41,17 @@
 v0.0.1 - Initial verion (not publicly released)
 v0.0.2 - Allow both, just pos or just neg output files
        - Preserve the FASTQ variant in the XML wrapper
+v0.0.3 - Fixed bug when generating non-matching FASTQ file only
+v0.0.4 - Deprecated, marked as hidden in the XML


 Developers
 ==========

-This script and similar versions for FASTA and SFF files are currently being
-developed on the following hg branch:
+This script and related tools are being developed on the following hg branch:
+http://bitbucket.org/peterjc/galaxy-central/src/tools
+
+This incorporates the previously used hg branch:
 http://bitbucket.org/peterjc/galaxy-central/src/fasta_filter

 For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball use
--- a/tools/fastq/fastq_filter_by_id.xml	Tue Jun 07 17:23:49 2011 -0400
+++ b/tools/fastq/fastq_filter_by_id.xml	Tue Jun 07 17:24:08 2011 -0400
@@ -1,4 +1,4 @@
-<tool id="fastq_filter_by_id" name="Filter FASTQ by ID" version="0.0.3">
+<tool id="fastq_filter_by_id" name="Filter FASTQ by ID" version="0.0.4" hidden="true">
 	<description>from a tabular file</description>
 	<command interpreter="python">
 fastq_filter_by_id.py $input_tabular $columns $input_fastq
@@ -54,20 +54,28 @@
 	</tests>
 	<help>

+**Deprecated**
+
+This tool is now obsolete, and should not be used in future. It has been
+replaced by a more general version covering FASTA, FASTQ and SFF in one
+single tool.
+
 **What it does**

 By default it divides a FASTQ file in two, those sequences with or without an
 ID present in the tabular file column(s) specified. You can opt to have a
 single output file of just the matching records, or just the non-matching ones.

-Note that the order of sequences in the original FASTQ file is preserved.
+Note that the order of sequences in the original FASTA file is preserved.
 Also, if any sequences share an identifier, duplicates are not removed.

 **Example Usage**

-You may have mapped your reads against a reference genome, and thus generated
-a tabular file of the mapped reads. You could use this tool to divide the reads
-into those which map onto the genome, and those which don't.
+You may have performed some kind of contamination search, for example running
+BLASTN against a database of cloning vectors or bacteria, giving you a tabular
+file containing read identifiers. You could use this tool to extract only the
+reads without BLAST matches (i.e. those which do not match your contaminant
+database).

 	</help>
 </tool>