PIPELINE:
This tool is the first step of the fosmid assembly and annotation pipeline. It assembles raw reads, looks for and removes the cloning vector, and extracts the longest and the most covered contigs. It has been build to handle two types of raw reads as inputs : single (454, ion torrent reads,...) or paired (Illumina,...) reads. This tools is not able to process PacBio or Oxford Nanopore reads. Raw read fastq file organization and naming The raw read files are organized in directories, one per sample. In the directories, each fastq file has to be gzipped. All the sample directories must be zipped in an unique file. The input files must be named "MiSeq.zip" (Paired.zip) for paired files, and "Proton.zip" (Single.zip) for single read files. Even if you have only one fastq file, this gz fastq file should also be in a zipped directory. When you upload your inputs files, choose the "no_unzip.zip" format. The assembly is perform by SPAdes. Assembly and post processing steps : 1 ) 100x before assembling read Sub-selection. 2 ) SPAdes (more information on http://bioinf.spbau.ru/en/spades) assembly. vector detection : the vector is searched and masked in the assembled contigs using cross-match (http://www.phrap.org/phredphrapconsed.html) contig filtering : the result file only includes contigs containing the vector and having a average depth of XXX. Warning Input fastq files name for paired data should contain "R1.f*" (R1.fq.. or R1.fastq...) and "R2.f*" to be identify as paired data by this tool. Outputs : The outputs are presented in a table containing one line per assembled sample. Each line contains the name of the sample, the number of resulting contigs, the length of the longest contig, the total length of the contigs and a link to the contig fasta file. Underneath the table there is a link to to complete result file containing all the resulting fasta files organized in directories named after the samples. Version Galaxy Tool : V1.0 |