Mercurial > repos > ecology > ecoregion_cluster_estimate
comparison Nb_cluster.xml @ 0:0f6542d0986e draft
planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/Ecoregionalization_workflow commit 2a2ae892fa2dbc1eff9c6a59c3ad8f3c27c1c78d
author | ecology |
---|---|
date | Wed, 18 Oct 2023 09:59:06 +0000 |
parents | |
children | e94a25eed489 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:0f6542d0986e |
---|---|
1 <tool id="ecoregion_cluster_estimate" name="ClusterEstimate" version="0.1.0+galaxy0" profile="22.05"> | |
2 <description>Find an optimal number of cluster with SIH index</description> | |
3 <requirements> | |
4 <requirement type="package" version="4.2.3">r-base</requirement> | |
5 <requirement type="package" version="2.1.4">r-cluster</requirement> | |
6 <requirement type="package" version="1.1.1">r-dplyr</requirement> | |
7 <requirement type="package" version="2.0.0">r-tidyverse</requirement> | |
8 </requirements> | |
9 <command detect_errors="exit_code"><![CDATA[ | |
10 Rscript | |
11 '$__tool_directory__/nb_clust_G.R' | |
12 '$envfile' | |
13 '$taxafile' | |
14 '$predictionfile' | |
15 '$max_k' | |
16 '$metric' | |
17 '$sample' | |
18 '$output1' | |
19 '$output2' | |
20 '$output3' | |
21 ]]> | |
22 </command> | |
23 <inputs> | |
24 <param name="envfile" type="data" format="txt,csv,tabular" label="Environment file"/> | |
25 <param name="taxafile" type="data" format="txt" label="Taxa selected file (List of taxa from TaxaSeeker tool)"/> | |
26 <param name="predictionfile" type="data" format="txt" multiple="true" label="Prediction files"/> | |
27 <param name="max_k" type="integer" value="2" min="1" label="Number of Cluster to test"/> | |
28 <param name="metric" type="select" label="What metric to use to calculate dissimilarities between observations ?"> | |
29 <option value="manhattan">manhattan</option> | |
30 <option value="euclidean">euclidean</option> | |
31 <option value="jaccard">jaccard</option> | |
32 </param> | |
33 <param name="sample" type="integer" label="The number of samples to be drawn from the dataset" min="5" value="10"/> | |
34 </inputs> | |
35 <outputs> | |
36 <data name="output1" from_work_dir="Indices_SIH.png" format="png" label="SIH index plot"/> | |
37 <data name="output2" from_work_dir="data_to_clus.tsv" format="tsv" label="Data to cluster"/> | |
38 <data name="output3" from_work_dir="data_bio.tsv" format="tsv" label="Data.bio table "/> | |
39 </outputs> | |
40 <tests> | |
41 <test> | |
42 <param name="envfile" value="ceamarc_env.csv"/> | |
43 <param name="taxafile" value="List_of_taxa.txt"/> | |
44 <param name="predictionfile" value="1_brts_pred_ceamarc.txt"/> | |
45 <param name='max_k' value="2"/> | |
46 <param name='metric' value="manhattan"/> | |
47 <param name='sample' value="10"/> | |
48 <output name='output1' value="SIH_index_plot.png"/> | |
49 <output name='output2' value="Data_to_cluster.tsv"/> | |
50 <output name='output3' value="Data.bio_table.tsv"/> | |
51 </test> | |
52 </tests> | |
53 <help><![CDATA[ | |
54 ================== | |
55 **What it does ?** | |
56 ================== | |
57 | |
58 The tool enables the determination of the optimal number of clusters for partition-based clustering, along with generating files used in the subsequent ecoregionalization workflow. | |
59 | |
60 =================== | |
61 **How to use it ?** | |
62 =================== | |
63 | |
64 The tool takes three inputs files: a file containing the environmental parameter values for each environment layer pixel (latitude-longitude), a file containing the list of selected taxa from previous step of the workflow and the file containing the BRT predictions. See example below. | |
65 | |
66 Then there are few parameters : | |
67 | |
68 - the maximum number of clusters to be tested with a minimum of two clusters | |
69 | |
70 - the metric used to calculate the dissimilarities between the observations: Manhattan, Euclidean and Jaccard | |
71 | |
72 - the sample size that will be used to perform clustering. Indeed, the clara function is used to clustering large data using a representative sample rather than the entire data set. This will speed up the clustering process and make the calculation more efficient. A fairly high value representative of the data is recommended. It is important to note that using too small a sample may result in loss of information compared to using the entire data set. | |
73 | |
74 The tool will produce three outputs. The first two are files that will be used in the rest of the workflow: a file containing four pieces of information, latitude, longitude, presence prediction and corresponding taxon, and a file containing the data to be partitioned. The third output corresponds to the main information of the tool, a graph presenting the value of the HIS index according to the number of clusters. The silhouette index provides a measure of the separation between clusters and the compactness within each cluster. The silhouette index ranges from -1 to 1. Values close to 1 indicate that objects are well grouped and separated from other clusters, while values close to -1 indicate that objects are poorly grouped and may be closer to other clusters. A value close to 0 indicates a situation where objects are located at the border between two neighboring clusters. | |
75 | |
76 **Example of the environemental file :** | |
77 | |
78 +------+------+---------+------+--------------+-----+ | |
79 | long | lat | Carbo | Grav | Maxbearing | ... | | |
80 +------+------+---------+------+--------------+-----+ | |
81 |139.22|-65.57| 0.88 |28.59 | 3.67 | ... | | |
82 +------+------+---------+------+--------------+-----+ | |
83 |139.22|-65.57| 0.88 |28.61 | 3.64 | ... | | |
84 +------+------+---------+------+--------------+-----+ | |
85 | ... | ... | ... | ... | ... | ... | | |
86 +------+------+---------+------+--------------+-----+ | |
87 | |
88 **Example of the Brt prediction file :** | |
89 | |
90 +-----------+----------+-----------------------+-------------+ | |
91 | lat | long | Prediction.index | spe | | |
92 +-----------+----------+-----------------------+-------------+ | |
93 | -65.57 | 139.22 | 0.122438487221909 | Acarnidae | | |
94 +-----------+----------+-----------------------+-------------+ | |
95 | -65.57 | 139.32 | 0.119154535627801 | Acarnidae | | |
96 +-----------+----------+-----------------------+-------------+ | |
97 | ... | ... | ... | ... | | |
98 +-----------+----------+-----------------------+-------------+ | |
99 | |
100 ]]> | |
101 </help> | |
102 </tool> | |
103 |