0
|
1 <tool id="cluster" name="FosBin" version="0.">
|
|
2 <command> /home/inmare/galaxy/tools/fosm_cluster $f1 $l $o1 $o2 </command>
|
|
3 <description> k-means clustering of assembled fosmids.</description>
|
|
4 <help>The tool was designed to tentatively assign contigs from incomplete fosmid assemblies to clusters, ideally corresponding to single fosmids. Clustering is performed based on tetra-nucleotide frequencies of the contigs and coverage. The current version is only compatible with SPAdes output as coverage is recovered from the fasta headers. Future version migth require a different set of input files. Full details are in Chiara et al. #paper id. Clustering of contigs is performed by a custom script based on the R implementation of the K-means algorithm, using 1500 starting positions for the centroids. The clustering is performed on metrics based on coverage, GC composition and tetra-nucleotide composition of each contig, which are computed directly from the fasta file. The user must input the desired number of clusters, contigs are partitioned accordingly." </help>
|
|
5 <inputs>
|
|
6 <param name="f1" type="data" format="fasta" label="fasta file with contigs" help="currently need to be in SPAdes format"/>
|
|
7 <param name="l" type="integer" label="number of clusters" value="5" help="should correspond to the number of fosmids"/>
|
|
8 </inputs>
|
|
9 <outputs>
|
|
10 <data name="o1" ftype="tabular" format="txt" label="fosmids to cluster table"/>
|
|
11 <data name="o2" ftype="fasta" format="fasta" label="modified fasta file, containing cluster identifiers in the header"/>
|
|
12 </outputs>
|
|
13 <test>
|
|
14 <param name="f1" value="sim1_galaxy.fasta"/>
|
|
15 <param name="l" value="9" />
|
|
16 <o1 name="outfile1" value="res"/>
|
|
17 <o2 name="outfile2" value="fasta.fas"/>
|
|
18 </test>
|
|
19 </tool>
|