Galaxy | Tool Preview

SHRiMP for Color-space (version 1.0.0)
No dataset? Read tip below
For most mapping needs use Commonly used settings. If you want full control use Full List

To use this tool your dataset needs to be in the csfasta (as ABI SOLiD color-space sequences) format. Click pencil icon next to your dataset to set the datatype to csfasta.


What it does

SHRiMP (SHort Read Mapping Package) is a software package for aligning genomic reads against a target genome.


Input formats

A multiple color-space file, for example:

>2_263_779_F3
T132032030200202202003211302222202230022110222

Outputs

The tool returns the default SHRiMP output:

   1                      2               3         4        5        6       7      8      9      10
--------------------------------------------------------------------------------------------------------------------
  >2_263_779_F3   Streptococcus_suis      +       814344  814388      1      45      45    3660    8x19x3x2x6x4x3

where:

 1. (>2_263_779_F3)        - Read id
 2. (Streptococcus_suis)   - Reference sequence id
 3. (+)                    - Strand of the read
 4. (814344)               - Start position of the alignment in the reference
 5. (814388)               - End position of the alignment in the reference
 6. (1)                    - Start position of the alignment in the read
 7. (45)                   - End position of the alignment in the read
 8. (45)                   - Length of the read
 9. (3660)                 - Score
10. (8x19x3x2x6x4x3)       - Edit string

SHRiMP parameter list

The commonly used parameters with default value setting:

-s    Spaced Seed                             (default: 111111011111)
      The spaced seed is a single contiguous string of 0's and 1's.
      0's represent wildcards, or positions which will always be
      considered as matching, whereas 1's dictate positions that
      must match. A string of all 1's will result in a simple kmer scan.
-n    Seed Matches per Window                 (default: 2)
      The number of seed matches per window dictates how many seeds
      must match within some window length of the genome before that
      region is considered for Smith-Waterman alignment. A lower
      value will increase sensitivity while drastically increasing
      running time. Higher values will have the opposite effect.
-t    Seed Hit Taboo Length                   (default: 4)
      The seed taboo length specifies how many target genome bases
      or colours must exist prior to a previous seed match in order
      to count another seed match as a hit.
-9    Seed Generation Taboo Length            (default: 0)

-w    Seed Window Length                      (default: 115.00%)
      This parameter specifies the genomic span in bases (or colours)
      in which *seed_matches_per_window* must exist before the read
      is given consideration by the Simth-Waterman alignment machinery.
-o    Maximum Hits per Read                   (default: 100)
      This parameter specifies how many hits to remember for each read.
      If more hits are encountered, ones with lower scores are dropped
      to make room.
-r    Maximum Read Length                     (default: 1000)
      This parameter specifies the maximum length of reads that will
      be encountered in the dataset. If larger reads than the default
      are used, an appropriate value must be passed to *rmapper*.
-d    Kmer Std. Deviation Limit               (default: -1 [None])
      This option permits pruning read kmers, which occur with
      frequencies greater than *kmer_std_dev_limit* standard
      deviations above the average. This can shorten running
      time at the cost of some sensitivity.
      *Note*: A negative value disables this option.
-m    S-W Match Value                         (default: 100)
      The value applied to matches during the Smith-Waterman score calculation.
-i    S-W Mismatch Value                      (default: -150)
      The value applied to mismatches during the Smith-Waterman
      score calculation.
-g    S-W Gap Open Penalty (Reference)        (default: -400)
      The value applied to gap opens along the reference sequence
      during the Smith-Waterman score calculation.
      *Note*: Note that for backward compatibility, if -g is set
      and -q is not set, the gap open penalty for the query will
      be set to the same value as specified for the reference.
-q    S-W Gap Open Penalty (Query)            (default: -400)
      The value applied to gap opens along the query sequence during
      the Smith-Waterman score calculation.
-e    S-W Gap Extend Penalty (Reference)      (default: -70)
      The value applied to gap extends during the Smith-Waterman score calculation.
      *Note*: Note that for backward compatibility, if -e is set
      and -f is not set, the gap exten penalty for the query will
      be set to the same value as specified for the reference.
-f    S-W Gap Extend Penalty (Query)          (default: -70)
      The value applied to gap extends during the Smith-Waterman score calculation.
-x
-h    S-W Full Hit Threshold                  (default: 68.00%)
      In letter-space, this parameter determines the threshold
      score for both vectored and full Smith-Waterman alignments.
      Any values less than this quantity will be thrown away.
      *Note* This option differs slightly in meaning between letter-space and color-space.
-v