view tools/fastq/fastq_paired_unpaired.xml @ 1:7ed81e36fc1c

Uploaded v0.0.5 which handles Illumina 1.8 style pair naming.
author peterjc
date Mon, 12 Dec 2011 11:33:10 -0500
parents 72e9fcaec61f
children 95a632a71951
line wrap: on
line source

<tool id="fastq_paired_unpaired" name="Divide FASTQ file into paired and unpaired reads" version="0.0.5">
	<description>using the read name suffices</description>
	<command interpreter="python">
fastq_paired_unpaired.py $input_fastq.extension $input_fastq
#if $output_choice_cond.output_choice=="separate"
 $output_forward $output_reverse
#elif $output_choice_cond.output_choice=="interleaved"
 $output_paired
#end if
$output_singles
	</command>
	<inputs>
		<param name="input_fastq" type="data" format="fastq" label="FASTQ file to divide into paired and unpaired reads"/>
		<conditional name="output_choice_cond">
			<param name="output_choice" type="select" label="How to output paired reads?">
				<option value="separate">Separate (two FASTQ files, for the forward and reverse reads, in matching order).</option>
				<option value="interleaved">Interleaved (one FASTQ file, alternating forward read then partner reverse read).</option>
			</param>
			<!-- Seems need these dummy entries here, compare this to indels/indel_sam2interval.xml -->
			<when value="separate" />
			<when value="interleaved" />
		</conditional>
	</inputs>
	<outputs>
		<data name="output_singles" format="input" label="Orphan or single reads"/>
		<data name="output_forward" format="input" label="Forward paired reads">
			<filter>output_choice_cond["output_choice"] == "separate"</filter>
		</data>
		<data name="output_reverse" format="input" label="Reverse paired reads">
			<filter>output_choice_cond["output_choice"] == "separate"</filter>
		</data>
		<data name="output_paired" format="input" label="Interleaved paired reads">
			<filter>output_choice_cond["output_choice"] == "interleaved"</filter>
		</data>
	</outputs>
	<tests>
	</tests>
	<requirements>
		<requirement type="python-module">Bio</requirement>
	</requirements>
	<help>

**What it does**

Using the common read name suffix conventions, it divides a FASTQ file into
paired reads, and orphan or single reads.

The input file should be a valid FASTQ file which has been sorted so that
any partner forward+reverse reads are consecutive. The output files all
preserve this sort order. Pairing are recognised based on standard name
suffices. See below or run the tool with no arguments for more details.

Any reads where the forward/reverse naming suffix used is not recognised
are treated as orphan reads. The tool supports the /1 and /2 convention
originally used by Illumina, .f and .r convention, the Sanger convention
(see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details),
and the current Illumina convention where the reads get the same identifier
with the fragment number in the description, for example:

 * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA
 * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA 

Note that this does support multiple forward and reverse reads per template
(which is quite common with Sanger sequencing), e.g. this which is sorted
alphabetically:

 * WTSI_1055_4p17.p1kapIBF
 * WTSI_1055_4p17.p1kpIBF
 * WTSI_1055_4p17.q1kapIBR
 * WTSI_1055_4p17.q1kpIBR

or this where the reads already come in pairs:

 * WTSI_1055_4p17.p1kapIBF
 * WTSI_1055_4p17.q1kapIBR
 * WTSI_1055_4p17.p1kpIBF
 * WTSI_1055_4p17.q1kpIBR

both become:

 * WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR
 * WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR

	</help>
</tool>