Galaxy | Tool Preview

Select high quality segments (version 1.0.0)
bases scoring below this value will trigger splitting
report all high quality segments above this length. Setting this option to '0' will cause the program to return a single longest run of high quality bases per read
if set to 'DO NOT trigger splitting' the program will not count low quality bases that are within or adjacent to homonucleotide runs. This will significantly reduce fragmentation of 454 data

To use this tool, your dataset needs to be in the Quality Score format. Click the pencil icon next to your dataset to set the datatype to Quality Score (see below for examples).


What it does

This tool finds high quality segments within sequencing reads generated by by Roche (454), Illumina (Solexa), or ABI SOLiD machines.


Example

Suppose this is your sequencing read:

5'---------*-------------*------**----3'

where dashes (-) are HIGH quality bases (above 20) and asterisks (*) are LOW quality bases (below 20). If the Minimal length of contiguous segment is set to 5 (of course, only for the purposes of this example), the tool will return:

5'---------
            -------------
                          -------

you can see that the tool simply splits the read on low quality bases and then returns all segments longer than 5. Note, that the output of this tool will likely contain higher number of shorter sequences compared to the original input. If we set the Minimal length of contiguous segment to 0, the tool will only return the single longest segment:

-------------