Galaxy |

rnaviralSPAdes (version 3.15.5+galaxy2)

Operation mode:

To run read error correction, reads should be in FASTQ format.

Single-end or paired-end short-reads:

It assumes that all samples belong to the same library. If you want to use samples from two different libraries, include the second library as additional set of short-reads.

FASTQ RNA-seq file(s):

Use an additional set of short-reads:

Enable this option if you want to combine to data sources (e.g. single and paired reads).

Additional read files

Additional read files 0

Pipeline options:

Error correction requires FASTQ input files.

Select k-mer detection option:

By default rnaSPAdes uses 2 k-mer sizes, which are automatically detected using read length (approximately one third and half of the maximal read length). We recommend not to change this parameter because smaller k-mer sizes typically result in multiple chimeric (misassembled) transcripts. Comma-separated list, all values must be odd, less than 128 and listed in ascending order.

Set Phred quality offset:

Phred quality offset in the input reads. Default: auto-detect

Select optional output file(s):

Only shown in history if selected here and generated by the specific run.

What it does

SPAdes - St. Petersburg genome assembler - is an assembly toolkit containing various assembly pipelines.

rnaviralSPAdes is a pipeline specially designed for de novo assembler tailored for RNA viral datasets (transcriptome, metatranscriptome and metavirome).

Input

SPAdes takes as input paired-end reads, mate-pairs and single (unpaired) reads in FASTA and FASTQ. For IonTorrent data SPAdes also supports unpaired reads in unmapped BAM format (like the one produced by Torrent Server). However, in order to run read error correction, reads should be in FASTQ or BAM format. Sanger, Oxford Nanopore and PacBio CLR reads can be provided in both formats since SPAdes does not run error correction for these types of data.

To run SPAdes 3.15.3 you need at least one library of the following types:

Illumina paired-end/high-quality mate-pairs/unpaired reads
IonTorrent paired-end/high-quality mate-pairs/unpaired reads
PacBio CCS reads
Illumina and IonTorrent libraries should not be assembled together. All other types of input data are compatible. SPAdes should not be used if only PacBio CLR, Oxford Nanopore, Sanger reads or additional contigs are available.

SPAdes supports mate-pair only assembly. However, we recommend to use only high-quality mate-pair libraries in this case (e.g. that do not have a paired-end part). We tested mate-pair only pipeline using Illumina Nextera mate-pairs.

Notes:

It is strongly suggested to provide multiple paired-end and mate-pair libraries according to their insert size (from smallest to longest).
It is not recommended to run SPAdes on PacBio reads with low coverage (less than 5).
We suggest not to run SPAdes on PacBio reads for large genomes.
SPAdes accepts gzip-compressed files.

A detailed description can be found in the input section of the manual.

Output

Assembly graph
Assembly graph with scaffolds
Contigs
Contigs paths in the assembly graph
Corrected reads by BayesHammer
Contigs stats
Log file
Scaffolds (recommended for use as resulting sequences)
Scaffolds paths in the assembly graph
Scaffolds stats

IonTorrent data

The selection of k-mer length is non-trivial for IonTorrent. If the dataset is more or less conventional (good coverage, not high GC, etc), then use our recommendation for long reads (e.g. assemble using k-mer lengths 21,33,55,77,99,127). However, due to increased error rate some changes of k-mer lengths (e.g. selection of shorter ones) may be required. For example, if you ran SPAdes with k-mer lengths 21,33,55,77 and then decided to assemble the same data set using more iterations and larger values of K, you can run SPAdes once again specifying the same output folder and the following options: --restart-from k77 -k 21,33,55,77,99,127 --mismatch-correction -o <previous_output_dir>. Do not forget to copy contigs and scaffolds from the previous run. We're planning to tackle issue of selecting k-mer lengths for IonTorrent reads in next versions.

You may need no error correction for Hi-Q enzyme at all. However, we suggest trying to assemble your data with and without error correction and select the best variant.

For non-trivial datasets (e.g. with high GC, low or uneven coverage) we suggest to enable single-cell mode (setting --sc option) and use k-mer lengths of 21,33,55.

References

More information are available on github.