Galaxy | Tool Preview

SALSA (version 2.3+galaxy5)
Headers must not contain ':'.
To start scaffolding with SALSA, reads need to be mapped to the assembly. BWA or BOWTIE2 are recommended. SALSA requires a bed file as the input. The alignment bam file can be converted using the bamToBed command from the Bedtools package.
Minimum contig length to scaffold
An assembly graph can be optionally provided to guide the scaffolding, potentially reducing the scaffolding errors
Hi-C experiments can use different restriction enzymes. The enzyme frequency in contigs is used to normalize the Hi-C interaction frequency. Note that you need to specify the actual sequence of the cutting site for a restriction enzyme and not the enzyme name. You can also specify DNASE as an enzyme if you use an enzyme-free prep, e.g. Omni-C.
SALSA will scaffold through sequential iterations. The default number of iterations is 3. Increasing the number of iterations will potentially increase the number of joins, however it could also introduce additional misjoins
Set this option to 'yes' if you want to find misassemblies in input assembly
Expected Genome size of the assembled genome. If not set, Salsa will estimate genome size.

Purpose

SALSA (Simple AssembLy ScAffolder) is a scaffolding tool based on a computational method that exploits the genomic proximity information in Hi-C data sets for long range scaffolding of de novo genome assemblies.


Mapping reads

To start the scaffolding, first step is to map reads to the assembly. We recommend using BWA or BOWTIE2 aligner to map reads. The read mapping generates a bam file. SALSA requires BED file as the input. This can be done using the bamToBed command from the Bedtools package. Also, SALSA requires BED files to be sorted by the read name, rather than the alignment coordinates. Once you have bam file, you can run following commands to get the bam file needed as an input to SALSA.

Since Hi-C reads and alignments contain experimental artifacts, the alignments needs some postprocessing. To align and postprocess the alignments, you can use the pipeline released by Arima Genomics which can be found in the GitHub repository.

Additional information on how to generate/filter the bam here.