Galaxy |

What it does

This program examines raw reads from an Illumina sequencing run and first, checks that the barcode and the RAD cutsite are intact, and demultiplexes the data. If there are errors in the barcode or the RAD site within a certain allowance process_radtags can correct them. Second, it slides a window down the length of the read and checks the average quality score within the window. If the score drops below 90% probability of being correct (a raw phred score of 10), the read is discarded. This allows for some seqeuncing errors while elimating reads where the sequence is degrading as it is being sequenced. By default the sliding window is 15% of the length of the read, but can be specified on the command line (the threshold and window size can be adjusted). The process_radtags program can: handle data that is barcoded, either inline or using an index, or unbarcoded. use combinatorial barcodes. check and correct for a restriction enzyme cutsite for single or double-digested data. filter adapter sequence while allowing for sequencing error in the adapter pattern. process individual files or whole directories of files. directly read gzipped data filter reads based on Illumina's Chastity filter

Help

Input files:

FASTQ, FASTA, zip, tar.gz
Barcode File Format

The barcode file is a very simple format : one barcode per line.

CGATA CGGCG GAAGC GAGAT CGATA CGGCG GAAGC GAGAT

Combinatorial barcodes are specified, one per column, separated by a tab:

CGATA   ACGTA
CGGCG   CGTA
GAAGC   CGTA
GAGAT   CGTA
CGATA   AGCA
CGGCG   AGCA
GAAGC   AGCA
GAGAT   AGCA

Instructions to add the functionality of archives management in Galaxy on the eBiogenouest HUB wiki .

Created by:

Stacks was developed by Julian Catchen with contributions from Angel Amores, Paul Hohenlohe, and Bill Cresko

Project links:

STACKS website .

STACKS manual .

STACKS google group .

References:

-J. Catchen, P. Hohenlohe, S. Bassham, A. Amores, and W. Cresko. Stacks: an analysis tool set for population genomics. Molecular Ecology. 2013.

-J. Catchen, S. Bassham, T. Wilson, M. Currey, C. O'Brien, Q. Yeates, and W. Cresko. The population structure and recent colonization history of Oregon threespine stickleback determined using restriction-site associated DNA-sequencing. Molecular Ecology. 2013.

-J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011.

-A. Amores, J. Catchen, A. Ferrara, Q. Fontenot and J. Postlethwait. Genome evolution and meiotic maps by massively parallel DNA sequencing: Spotted gar, an outgroup for the teleost genome duplication. Genetics, 188:799'808, 2011.

-P. Hohenlohe, S. Amish, J. Catchen, F. Allendorf, G. Luikart. RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow trout and westslope cutthroat trout. Molecular Ecology Resources, 11(s1):117-122, 2011.

-K. Emerson, C. Merz, J. Catchen, P. Hohenlohe, W. Cresko, W. Bradshaw, C. Holzapfel. Resolving postglacial phylogeography using high-throughput sequencing. Proceedings of the National Academy of Science, 107(37):16196-200, 2010.

Integrated by:

Yvan Le Bras and Cyril Monjeaud

GenOuest Bio-informatics Core Facility

UMR 6074 IRISA INRIA-CNRS-UR1 Rennes (France)

support@genouest.org

If you use this tool in Galaxy, please cite :

Y. Le Bras, A. Roult, C. Monjeaud, M. Bahin, O. Quenez, C. Heriveau, A. Bretaudeau, O. Sallou, O. Collin, Towards a Life Sciences Virtual Research Environment : an e-Science initiative in Western France. JOBIM 2013.