What it does
Converts output of SOLiD instrument (versions 3.5 and earlier) to fastq format suitable for bowtie, bwa, and PerM mappers.
Input datasets
Below are examples of forward (F3) reads and quality scores:
Reads:
>1831_573_1004_F3 T00030133312212111300011021310132222 >1831_573_1567_F3 T03330322230322112131010221102122113
Quality scores:
>1831_573_1004_F3 4 29 34 34 32 32 24 24 20 17 10 34 29 20 34 13 30 34 22 24 11 28 19 17 34 17 24 17 25 34 7 24 14 12 22 >1831_573_1567_F3 8 26 31 31 16 22 30 31 28 29 22 30 30 31 32 23 30 28 28 31 19 32 30 32 19 8 32 10 13 6 32 10 6 16 11
Mate pairs
If your data is from a mate-paired run, you will have additional read and quality datasets that will look similar to the ones above with one exception: the names of reads will be ending with "_R3". In this case choose Yes from the Is this a mate-pair run? drop down and you will be able to select R reads. When processing mate pairs this tool generates two output files: one for F3 reads and the other for R3 reads. The reads are guaranteed to be paired -- mated reads will be in the same position in F3 and R3 fastq file. However, because pairing is verified it may take a while to process an entire SOLiD run (several hours).
Explanation of parameters
Remove reads containing color qualities below this value - any read that contains as least one color call with quality lower than the specified value will not be reported.
Trim trailing "_F3" and "_R3"? - does just that. Not necessary for bowtie. Required for BWA.
Trim first base? - SOLiD reads contain an adapter base such as the first T in this read:
>1831_573_1004_F3 T00030133312212111300011021310132222
this option removes this base leaving only color calls. Not necessary for bowtie. Required for BWA.
Double encode? - converts color calls (0123.) to pseudo-nucleotides (ACGTN). Not necessary for bowtie. Required for BWA.
Examples of output
When all parameters are left "as-is" you will get this (using reads and qualities shown above):
@1831_573_1004 T00030133312212111300011021310132222 + %>CCAA9952+C>5C.?C79,=42C292:C(9/-7 @1831_573_1004 T03330322230322112131010221102122113 + );@@17?@=>7??@A8?==@4A?A4)A+.'A+'1,
Setting Trim first base from reads to Yes will produce this:
@1831_573_1004 00030133312212111300011021310132222 + %>CCAA9952+C>5C.?C79,=42C292:C(9/-7 @1831_573_1004 03330322230322112131010221102122113 + );@@17?@=>7??@A8?==@4A?A4)A+.'A+'1,
Finally, setting Double encode to Yes will yield:
@1831_573_1004 TAAATACTTTCGGCGCCCTAAACCAGCTCACTGGGG + %>CCAA9952+C>5C.?C79,=42C292:C(9/-7 @1831_573_1004 TATTTATGGGTATGGCCGCTCACAGGCCAGCGGCCT + );@@17?@=>7??@A8?==@4A?A4)A+.'A+'1,