Galaxy tool preview

Divide FASTQ file into paired and unpaired reads (version 0.0.5)

What it does

Using the common read name suffix conventions, it divides a FASTQ file into paired reads, and orphan or single reads.

The input file should be a valid FASTQ file which has been sorted so that any partner forward+reverse reads are consecutive. The output files all preserve this sort order. Pairing are recognised based on standard name suffices. See below or run the tool with no arguments for more details.

Any reads where the forward/reverse naming suffix used is not recognised are treated as orphan reads. The tool supports the /1 and /2 convention originally used by Illumina, .f and .r convention, the Sanger convention (see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details), and the current Illumina convention where the reads get the same identifier with the fragment number in the description, for example:

  • @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA
  • @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA

Note that this does support multiple forward and reverse reads per template (which is quite common with Sanger sequencing), e.g. this which is sorted alphabetically:

  • WTSI_1055_4p17.p1kapIBF
  • WTSI_1055_4p17.p1kpIBF
  • WTSI_1055_4p17.q1kapIBR
  • WTSI_1055_4p17.q1kpIBR

or this where the reads already come in pairs:

  • WTSI_1055_4p17.p1kapIBF
  • WTSI_1055_4p17.q1kapIBR
  • WTSI_1055_4p17.p1kpIBF
  • WTSI_1055_4p17.q1kpIBR

both become:

  • WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR
  • WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR