Galaxy | Tool Preview

pal_finder (version 0.02.04.8)
This prefix will be added to the beginning of all primer names
Currently pal_finder only handles Illumina paired-end reads and 454 single-end reads
Apply none, one or more filters to refine results
Must detect at least one repeat of this n-mer unit
Set to zero to ignore repeats of this n-mer unit
Set to zero to ignore repeats of this n-mer unit
Set to zero to ignore repeats of this n-mer unit
Set to zero to ignore repeats of this n-mer unit
Specify file of nucleotide sequences to avoid amplifying (PRIMER_MISPRIMING_LIBRARY)
Advanced users can customise the settings for primer3 for more control
Can be used to screen reads in input Fastqs
Can be used to run pal_finder outside of Galaxy

What it does

This tool runs the pal_finder program, which finds microsatellite repeat elements directly from raw 454 or Illumina paired-end sequencing reads. It then designs PCR primers to amplify these repeat loci (Potentially Amplifiable Loci: PAL).

Optionally for Illumina data, one or more filters can be applied to the output from pal_finder to:

  • Only include loci with designed primers
  • Exclude loci where the primer sequences occur more than once in the reads
  • Only include loci with 'perfect' motifs (and rank by motif size,largest to smallest)
  • Use PANDAseq to assemble paired-end reads and confirm primer sequences are present in high-quality assembly

Pal_finder runs the primer3_core program; information on the settings used in primer3_core can be found in the Primer3 manual at http://primer3.sourceforge.net/primer3_manual.htm


Known issues

Low number of reads used for microsatellite detection/bad primer product size ranges

For some datasets pal_finder may generate 'bad' product size ranges (where the lower limit exceeds the upper limit) for one or more reads, for input into primer3_core. In these cases primer3_core will terminate prematurely, which can result in a substantially lower number of reads being used for microsatellite detection and potentially sub-optimal primer design.

The number of reads generating the bad size ranges are reported in the Summary of microsat types output dataset as 'readsWithBadRanges'. Ideally the reported value should be zero.

The conditions which cause this issue within pal_finder are still unclear, however we believe it to be associated with short or low quality reads. If this problem affects your data then:

Pal_finder takes a long time to run for large input datasets

pal_finder was originally developed using MiSeq data, and is not optimised for working with the larger Fastqs that are output from other platforms such as HiSeq and NextSeq. As a consequence pal_finder may take a very long time to complete when operating on larger datasets.

If this is a problem then the tool can be run using a subset of the input reads by unchecking the Use all reads... option and entering either an integer number of reads to use, or a decimal fraction (e.g. 0.5 will select 50% of the reads).


Credits

This Galaxy tool has been developed by Peter Briggs within the Bioinformatics Core Facility at the University of Manchester. It runs the pal_finder package which can be obtained from http://sourceforge.net/projects/palfinder/:

  • PLoS One. 2012; 7(2): e30953 "Rapid Microsatellite Identification from Illumina Paired-End Genomic Sequencing in Two Birds and a Snake" Todd A. Castoe, Alexander W. Poole, A. P. Jason de Koning, Kenneth L. Jones, Diana F. Tomback, Sara J. Oyler-McCance, Jennifer A. Fike, Stacey L. Lance, Jeffrey W. Streicher, Eric N. Smith, and David D. Pollock

The paper is available at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3279355/

This tool is compatible with pal_finder version 0.02.04, which in turn runs the primer3_core program (version 2.0.0-alpha is required, available from http://primer3.sourceforge.net/releases.php):

  • Steve Rozen and Helen J. Skaletsky (2000) "Primer3 on the WWW for general users and for biologist programmers". In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365-386

The paper is available at http://purl.com/STEVEROZEN/papers/rozen-and-skaletsky-2000-primer3.pdf

The filtering and assembly of the pal_finder output for Illumina data is performed using a Python utility written by Graeme Fox at the University of Manchester, and which is included with this tool; this utility uses the BioPython and PANDAseq packages.

Please kindly acknowledge both this Galaxy tool, the pal_finder and primer3 packages, and the utility script and its dependencies if you use it in your work.