Galaxy | Tool Preview

CREST (version 1.0)
BAM files must contain soft-clipping signatures at the breakpoints. If they do not, you will not get any results.
if your genome of interest is not listed - contact Galaxy team
Requires a gene model file
The read length of the sequencing data, defaut 100
The program will generate more SVs with higher false positive rate.
The range where SV will be detected, using chr1:100-200 format
a SV with only 1 side with enough soft-clipped reads is considered as a valid one, default is ON.
Remove tandem repeat caused SV events, default is ON.

CREST

CREST is an algorithm for detecting genomic structural variations at base-pair resolution using next-generation sequencing data. '

CREST uses pieces of DNA called soft clips to find structural variations. Soft clips are the DNA segments produced during sequencing that fail to properly align to the reference genome as the sample genome is reassembled. CREST uses the soft clips to precisely identify sites of chromosomal rearrangement or where pieces of DNA are inserted or deleted.

Please cite the following article:

Wang J, Mullighan CG, Easton J, Roberts S, Heatley SL, Ma J, Rusch MC, Chen K, Harris CC, Ding L, Holmfeldt L, Payne-Turner D, Fan X, Wei L, Zhao D, Obenauer JC, Naeve C, Mardis ER, Wilson RK, Downing JR and Zhang J. CREST maps somatic structural variation in cancer genomes with base-pair resolution (2011). Nature_Methods.


Input formats

BAM files that must contain soft-clipping signatures at the breakpoints. If they do not, you will not get any results.

CREST uses soft-clipping signatures to identify breakpoints. Soft-clipping is indicated by "S" elements in the CIGAR for SAM/BAM records. Soft-clipping may not occur, depending on the mapping algorithm and parameters and sometimes even the library preparation.

With bwa sampe:

One mapping method that will soft-clip reads is bwa sampe (BWA for paired-end reads). When BWA successfully maps one read in a pair but is not able to map the other, it will attempt a more permissive Smith-Waterman alignment of the unmapped read in the neighborhood of the mapped mate. If it is only able to align part of the read, then it will soft-clip the portion on the end that it could not align. Often this occurs at the breakpoints of structural variations.

In some cases when the insert sizes approach the read length, BWA will not perform Smith-Waterman alignment. Reads from inserts smaller than the read length will contain primer and/or adapter and will often not map. When the insert size is close to the read length, this creates a skewed distribution of inferred insert sizes which may cause BWA to not attempt Smith-Waterman realignment. This is indicated by the error message "weird pairing". Often in these cases there are also unusually low mapping rates.

One way to fix this problem is to remap unmapped reads bwasw. To do this, extract the unmapped reads as FASTQ files (this may be done with a combination of samtools view -f 4 and Picard's SamToFastq). Realign using bwa bwasw and build a BAM file. Then, re-run CREST on this new BAM file, and you may pick up events that would have been missed otherwise.


Outputs

The output file *.predSV.txt has the following tab-delimited columns: left_chr, left_pos, left_strand, # of left soft-clipped reads, right_chr, right_pos, right_strand, # right soft-clipped reads, SV type, coverage at left_pos, coverage at right_pos, assembled length at left_pos, assembled length at right_pos, average percent identity at left_pos, percent of non-unique mapping reads at left_pos, average percent identity at right_pos, percent of non-unique mapping reads at right_pos, start position of consensus mapping to genome, starting chromosome of consensus mapping, position of the genomic mapping of consensus starting position, end position of consensus mapping to genome, ending chromsome of consnesus mapping, position of genomic mapping of consensus ending posiiton, and consensus sequences. For inversion(INV), the last 7 fields will be repeated to reflect the fact two different breakpoints are needed to identify an INV event.

System Message: WARNING/2 (<string>, line 59); backlink

Inline emphasis start-string without end-string.

Example of the tumor.predSV.txt file:

4 125893227 + 5 10 66301858 - 4 CTX 29 14 83 71 0.895173453996983 0.230769230769231 0.735384615384615 0.5 1 4 125893135 176 10 66301773 TTATGAATTTTGAAATATATATCATATTTTGAAATATATATCATATTCTAAATTATGAAAAGAGAATATGATTCTCTTTTCAGTAGCTGTCACCTCCTGGGTTCAAGTGATTCTCCTGCCTCTACCTCCCGAGTAGCTGGGATTACAGGTGCCCACCACCATGCCTGGCTAATTTT 5 7052198 - 0 10 66301865 + 8 CTX 0 22 0 81 0.761379310344828 0.482758620689655 0 0 1 5 7052278 164 10 66301947 AGCCATGGACCTTGTGGTGGGTTCTTAACAATGGTGAGTCCGGAGTTCTTAACGATGGTGAGTCCGTAGTTTGTTCCTTCAGGAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCAACAGAGGAAGACTCCACCTCAAAAAAAAAAAGTGGGAAGAGG 10 66301858 + 4 4 125893225 - 1 CTX 15 28 71 81 0.735384615384615 0.5 0.889507154213037 0.243243243243243 1 10 66301777 153 4 125893154 TTAGCCAGGCATGGTGGTGGGCACCTGTAATCCCAGCTACTCGGGAGGTAGAGGCAGGAGAATCACTTGAACCCAGGAGGTGACAGCTACTGAAAAGAGAATCATATTCTCTTTTCATAATTTAGAATATGATATATATTTCAAAATATGATA

If there are no or very few results, there may be a lack of soft-clipping.