Galaxy | Tool Preview

FROGS Demultiplex reads (version 2.0.0)
This file describes barcodes and samples (one line by sample tabulated separated from barcode sequence(s)). See Help section
Select between paired and single-end data
Specify dataset of your single end reads
Number of mismatches allowed in barcode
The barcode is placed either at the beginning of the forward end or of the reverse end or both?

This tool classifies single or paired-end reads in function of barcode forward or reverse in the first or both reads.

Command line:

demultiplex.py --input-R1 *FQ_INPUT1* [--input-R2 *FQ_INPUT2*] --input-barcode *TXT_BARCODE* --mismatches *MISMATCH* --end *END* --summary *TXT_SUMMARY_OUTPUT* --output-demultiplexed *TARGZ_DEMULT_ARCHIVE_OUTPUT* --output-excluded *TARGZ_UNDEMULT_ARCHIVE_OUTPUT*
Inputs
Input name Meaning
FQ_INPUT1 Fastq input file for the first read (single-end or forward read of paired-end sequences)
FQ_INPUT2 Fastq input file for the second read (only for paired-end sequences)
TXT_BARCODE Tabulated text file that describes barcode sequences used to multiplexe samples: SAMPLE_NAME BARCODE1 [BARCODE2]
Options
Option name Meaning
-m/--mismatches MISMATCH Number of allowed mismatch in each barcode
-e/--end END To which end must the barcode be found : forward (begin of the (first) read), reverse (end of the (second) read) or both
Outputs
Output name Meaning
TXT_SUMMARY_OUTPUT A tabulated text file which summarises the number of sequences (single or paired) for each sample
TARGZ_DEMULT_ARCHIVE_OUTPUT A TAR.GZ archive that contains all fastq files for each sample
TARGZ_UNDEMULT_ARCHIVE_OUTPUT A TAR.GZ archive that contains all fastq files for undemultiplexed reads

Format

BARCODE_FILE :

This file is expected to be tabulated

-first column corresponds to the sample name

-second column corresponds to the sequence barcode used

-third column (optional) corresponds to the reverse sequence barcode

Take care to indicate sequence barcode in the strand of the read, so you may need to reverse complement the reverse barcode sequence

All barcode sequences must have the same length

Example of barcode file: Here the sample is multiplexed by both fragment ends.

/repository/static/images/525e78406276b403/static%2Fimages%2Fdemultiplex_barcode.png
FASTQ :

Text file describing biological sequences in a 4 line format:

-first line starts by "@" corresponds to the sequence identifier and optionally the sequence description

-second line is the sequence itself

-third line is a "+" following by the sequence identifier or not depending on the version

-fourth line is the quality sequence, one code per base. The code depends on its version and the sequencer

Click here for more details on the fastq format

Example of fastq read corresponding to the previous barcode file

/repository/static/images/525e78406276b403/static%2Fimages%2Fdemultiplex_fastq_ex.png

For each sequence or sequence pair, the sequence fragment at the beginning (forward multiplexing) of the (first) read or at the end (reverse multiplexing) of the (second) read will be compared to all barcodes of the barecode file.

If this fragment is found once and only once (regarding the mismatch threshold), the fragment is trimmed and the sequence will be attributed to the corresponding sample.

Finally fastq files (or pair of fastq files) for each sample are included in an archive and a report, describing how many sequences are attributed for each sample, is created.

Do not forget to indicate barcode sequence as they really are in the fastq sequence file, especially if you have multiplexed data via the reverse strand.

For the mismatch threshold, we advised to let the threshold to 0. Then if you are not satisfied by the result try with 1. The number of mismatches depends on the length of the barcode, but frequently this sequences are very short so 1 mismatch is already more than the sequencing error rate.

If you have different barcode lengths, you must demultiplex your data in several steps, beginning by the longest barcode set. Then to trim the barcodes with smaller lengths, you use the "unmatched" or "ambiguous" sequence file with smaller barcodes and so on.

If you have Roche 454 sequences in sff format, you must convert them with some programs like sff2fastq or sff_to_fastq (installable in Galaxy)


Contact

Contacts: frogs@inra.fr

Repository: https://github.com/geraldinepascal/FROGS

Please cite the FROGS Publication: Escudie F., Auer L., Bernard M., Cauquil L., Vidal K., Maman S., Mariadassou M., Combes S., Hernandez-Raquet G., Pascal G., 2016. FROGS: Find Rapidly OTU with Galaxy Solution. In: ISME-2016 Montreal, CANADA , http://bioinfo.genotoul.fr/wp-content/uploads/FROGS_ISME2016_poster.pdf

Depending on the help provided you can cite us in acknowledgements, references or both.