IMPORTANT: This tool currently only supports data where the quality scores are integers or ASCII quality scores with base 64. Click pencil icon next to your dataset to set datatype to fastqsolexa.
What it does
SHRiMP (SHort Read Mapping Package) is a software package for aligning genomic reads against a target genome.
This wrapper post-processes the default SHRiMP/rmapper-ls output and generates a table with all information from reads and reference for the mapping. The tool takes single- or paired-end reads. For single-end reads, only uniquely mapped alignment is considered. In paired-end reads, only pairs that meet the following criteria will be used to generate the table: 1). the ends fall within the insertion size; 2). the ends are mapped at the opposite directions. If there are still multiple mappings after applying the criteria, this paired-end read will be discarded.
Input formats
A multiple-fastq file, for example:
@seq1 TACCCGATTTTTTGCTTTCCACTTTATCCTACCCTT +seq1 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Outputs
The tool gives two outputs.
Table output
Table output contains 8 columns:
1 2 3 4 5 6 7 8 ---------------------------------------------------- chrM 14711 seq1 0 T A 40 1 chrM 14712 seq1 1 T T 40 1
where:
1. (chrM) - Reference sequence id 2. (14711) - Position of the mapping in the reference 3. (seq1) - Read id 4. (0) - Position of the mapping in the read 5. (T) - Nucleotide in the reference 6. (A) - Nucleotide in the read 7. (40) - Quality score for the nucleotide in the position of the read 8. (1) - The number of times this position is covered by reads
SHRiMP output
This is the default output from SHRiMP/rmapper-ls:
1 2 3 4 5 6 7 8 9 10 ------------------------------------------------------------------- seq1 chrM + 3644 3679 1 36 36 3600 36
where:
1. (seq1) - Read id 2. (chrM) - Reference sequence id 3. (+) - Strand of the read 4. (3466) - Start position of the alignment in the reference 5. (3679) - End position of the alignment in the reference 6. (1) - Start position of the alignment in the read 7. (36) - End position of the alignment in the read 8. (36) - Length of the read 9. (3600) - Score 10. (36) - Edit string
SHRiMP parameter list
The commonly used parameters with default value setting:
-s Spaced Seed (default: 111111011111) The spaced seed is a single contiguous string of 0's and 1's. 0's represent wildcards, or positions which will always be considered as matching, whereas 1's dictate positions that must match. A string of all 1's will result in a simple kmer scan. -n Seed Matches per Window (default: 2) The number of seed matches per window dictates how many seeds must match within some window length of the genome before that region is considered for Smith-Waterman alignment. A lower value will increase sensitivity while drastically increasing running time. Higher values will have the opposite effect. -t Seed Hit Taboo Length (default: 4) The seed taboo length specifies how many target genome bases or colors must exist prior to a previous seed match in order to count another seed match as a hit. -9 Seed Generation Taboo Length (default: 0) -w Seed Window Length (default: 115.00%) This parameter specifies the genomic span in bases (or colours) in which *seed_matches_per_window* must exist before the read is given consideration by the Simth-Waterman alignment machinery. -o Maximum Hits per Read (default: 100) This parameter specifies how many hits to remember for each read. If more hits are encountered, ones with lower scores are dropped to make room. -r Maximum Read Length (default: 1000) This parameter specifies the maximum length of reads that will be encountered in the dataset. If larger reads than the default are used, an appropriate value must be passed to *rmapper*. -d Kmer Std. Deviation Limit (default: -1 [None]) This option permits pruning read kmers, which occur with frequencies greater than *kmer_std_dev_limit* standard deviations above the average. This can shorten running time at the cost of some sensitivity. *Note*: A negative value disables this option. -m S-W Match Value (default: 100) The value applied to matches during the Smith-Waterman score calculation. -i S-W Mismatch Value (default: -150) The value applied to mismatches during the Smith-Waterman score calculation. -g S-W Gap Open Penalty (Reference) (default: -400) The value applied to gap opens along the reference sequence during the Smith-Waterman score calculation. *Note*: Note that for backward compatibility, if -g is set and -q is not set, the gap open penalty for the query will be set to the same value as specified for the reference. -q S-W Gap Open Penalty (Query) (default: -400) The value applied to gap opens along the query sequence during the Smith-Waterman score calculation. -e S-W Gap Extend Penalty (Reference) (default: -70) The value applied to gap extends during the Smith-Waterman score calculation. *Note*: Note that for backward compatibility, if -e is set and -f is not set, the gap exten penalty for the query will be set to the same value as specified for the reference. -f S-W Gap Extend Penalty (Query) (default: -70) The value applied to gap extends during the Smith-Waterman score calculation. -h S-W Hit Threshold (default: 68.00%) In letter-space, this parameter determines the threshold score for both vectored and full Smith-Waterman alignments. Any values less than this quantity will be thrown away. *Note* This option differs slightly in meaning between letter-space and color-space.