What it does
This program examines raw reads from an Illumina sequencing run and first, checks that the barcode and the RAD cutsite are intact, and demultiplexes the data. If there are errors in the barcode or the RAD site within a certain allowance process_radtags can correct them. Second, it slides a window down the length of the read and checks the average quality score within the window. If the score drops below 90% probability of being correct (a raw phred score of 10), the read is discarded. This allows for some seqeuncing errors while elimating reads where the sequence is degrading as it is being sequenced. By default the sliding window is 15% of the length of the read, but can be specified on the command line (the threshold and window size can be adjusted). The process_radtags program can: handle data that is barcoded, either inline or using an index, or unbarcoded. use combinatorial barcodes. check and correct for a restriction enzyme cutsite for single or double-digested data. filter adapter sequence while allowing for sequencing error in the adapter pattern. process individual files or whole directories of files. directly read gzipped data filter reads based on Illumina's Chastity filter
Help
Input files:
The barcode file is a very simple format : one barcode per line.
CGATA CGGCG GAAGC GAGAT CGATA CGGCG GAAGC GAGAT
Combinatorial barcodes are specified, one per column, separated by a tab:
CGATA ACGTA CGGCG CGTA GAAGC CGTA GAGAT CGTA CGATA AGCA CGGCG AGCA GAAGC AGCA GAGAT AGCA
Instructions to add the functionality of archives management in Galaxy on the eBiogenouest HUB wiki .
Created by:
Stacks was developed by Julian Catchen with contributions from Angel Amores, Paul Hohenlohe, and Bill Cresko
Project links:
References:
-J. Catchen, P. Hohenlohe, S. Bassham, A. Amores, and W. Cresko. Stacks: an analysis tool set for population genomics. Molecular Ecology. 2013.
-J. Catchen, S. Bassham, T. Wilson, M. Currey, C. O'Brien, Q. Yeates, and W. Cresko. The population structure and recent colonization history of Oregon threespine stickleback determined using restriction-site associated DNA-sequencing. Molecular Ecology. 2013.
-J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011.
-A. Amores, J. Catchen, A. Ferrara, Q. Fontenot and J. Postlethwait. Genome evolution and meiotic maps by massively parallel DNA sequencing: Spotted gar, an outgroup for the teleost genome duplication. Genetics, 188:799'808, 2011.
-P. Hohenlohe, S. Amish, J. Catchen, F. Allendorf, G. Luikart. RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow trout and westslope cutthroat trout. Molecular Ecology Resources, 11(s1):117-122, 2011.
-K. Emerson, C. Merz, J. Catchen, P. Hohenlohe, W. Cresko, W. Bradshaw, C. Holzapfel. Resolving postglacial phylogeography using high-throughput sequencing. Proceedings of the National Academy of Science, 107(37):16196-200, 2010.
Integrated by:
Yvan Le Bras and Cyril Monjeaud
GenOuest Bio-informatics Core Facility
UMR 6074 IRISA INRIA-CNRS-UR1 Rennes (France)
If you use this tool in Galaxy, please cite :