Galaxy |

pal_finder (version 0.02.04.8)

Primer prefix:

This prefix will be added to the beginning of all primer names

Sequencing platform used to generate data:

Currently pal_finder only handles Illumina paired-end reads and 454 single-end reads

Input Type:

Illumina fastq file (read 1):

Illumina fastq file (read 2):

Use all reads for microsatellite detection?:

Filters to apply to the pal_finder results:

Apply none, one or more filters to refine results

Use PANDAseq to assemble paired-end reads and confirm primer sequences are present in high-quality assembly:

Minimum number of 2-mer repeat units to detect:

Must detect at least one repeat of this n-mer unit

Minimum number of 3-mer repeat units:

Set to zero to ignore repeats of this n-mer unit

Minimum number of 4-mer repeat units:

Set to zero to ignore repeats of this n-mer unit

Minimum number of 5-mer repeat units:

Set to zero to ignore repeats of this n-mer unit

Minimum number of 6-mer repeat units:

Set to zero to ignore repeats of this n-mer unit

Mispriming library to use:

Specify file of nucleotide sequences to avoid amplifying (PRIMER_MISPRIMING_LIBRARY)

Primer settings to use:

Advanced users can customise the settings for primer3 for more control

Output IDs for input reads which generate bad primer product size ranges:

Can be used to screen reads in input Fastqs

Output the config file to the history:

Can be used to run pal_finder outside of Galaxy

What it does

This tool runs the pal_finder program, which finds microsatellite repeat elements directly from raw 454 or Illumina paired-end sequencing reads. It then designs PCR primers to amplify these repeat loci (Potentially Amplifiable Loci: PAL).

Optionally for Illumina data, one or more filters can be applied to the output from pal_finder to:

Only include loci with designed primers

Exclude loci where the primer sequences occur more than once in the reads

Only include loci with 'perfect' motifs (and rank by motif size,largest to smallest)

Use PANDAseq to assemble paired-end reads and confirm primer sequences are present in high-quality assembly

Pal_finder runs the primer3_core program; information on the settings used in primer3_core can be found in the Primer3 manual at http://primer3.sourceforge.net/primer3_manual.htm

Known issues

Low number of reads used for microsatellite detection/bad primer product size ranges

For some datasets pal_finder may generate 'bad' product size ranges (where the lower limit exceeds the upper limit) for one or more reads, for input into primer3_core. In these cases primer3_core will terminate prematurely, which can result in a substantially lower number of reads being used for microsatellite detection and potentially sub-optimal primer design.

The number of reads generating the bad size ranges are reported in the Summary of microsat types output dataset as 'readsWithBadRanges'. Ideally the reported value should be zero.

The conditions which cause this issue within pal_finder are still unclear, however we believe it to be associated with short or low quality reads. If this problem affects your data then:

Ensure that the input data are sufficiently trimmed and filtered (using e.g. the Trimmomatic tool) before rerunning pal_finder.
A list of read IDs for which pal_finder generates bad product size ranges can be output by turning on Output IDs for input reads which generate bad primer ranges. This outputs an additional dataset with a list of read IDs which can be used to remove read pairs from the input Fastq files (using e.g. the Filter sequences by ID tool) before rerunning pal_finder.

Pal_finder takes a long time to run for large input datasets

pal_finder was originally developed using MiSeq data, and is not optimised for working with the larger Fastqs that are output from other platforms such as HiSeq and NextSeq. As a consequence pal_finder may take a very long time to complete when operating on larger datasets.

If this is a problem then the tool can be run using a subset of the input reads by unchecking the Use all reads... option and entering either an integer number of reads to use, or a decimal fraction (e.g. 0.5 will select 50% of the reads).

Credits

This Galaxy tool has been developed by Peter Briggs within the Bioinformatics Core Facility at the University of Manchester. It runs the pal_finder package which can be obtained from http://sourceforge.net/projects/palfinder/:

PLoS One. 2012; 7(2): e30953 "Rapid Microsatellite Identification from Illumina Paired-End Genomic Sequencing in Two Birds and a Snake" Todd A. Castoe, Alexander W. Poole, A. P. Jason de Koning, Kenneth L. Jones, Diana F. Tomback, Sara J. Oyler-McCance, Jennifer A. Fike, Stacey L. Lance, Jeffrey W. Streicher, Eric N. Smith, and David D. Pollock

The paper is available at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3279355/

This tool is compatible with pal_finder version 0.02.04, which in turn runs the primer3_core program (version 2.0.0-alpha is required, available from http://primer3.sourceforge.net/releases.php):

Steve Rozen and Helen J. Skaletsky (2000) "Primer3 on the WWW for general users and for biologist programmers". In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365-386

The paper is available at http://purl.com/STEVEROZEN/papers/rozen-and-skaletsky-2000-primer3.pdf

The filtering and assembly of the pal_finder output for Illumina data is performed using a Python utility written by Graeme Fox at the University of Manchester, and which is included with this tool; this utility uses the BioPython and PANDAseq packages.

Please kindly acknowledge both this Galaxy tool, the pal_finder and primer3 packages, and the utility script and its dependencies if you use it in your work.