comparison accuracy.xml @ 2:6169ba9ed42a draft

Uploaded
author testtool
date Fri, 13 Oct 2017 10:10:32 -0400
parents
children
comparison
equal deleted inserted replaced
1:a3a8499f0f95 2:6169ba9ed42a
1 <tool id="accuracy" name="accuracy" version="1.0.0">
2 <description>model creation and accuracy estimation</description>
3 <requirements>
4 <requirement type="package" version="6.0_76">r-caret</requirement>
5 </requirements>
6 <command detect_errors="aggressive">
7 Rscript '$__tool_directory__/accuracy.R' '$input' '$p' '$output1' '$output2'
8 </command>
9 <inputs>
10 <param format="csv" type="data" name="input" value="" label="Input dataset" help="
11 e.g. iris species table
12 Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
13 5.1,3.5,1.4,0.2,Iris-setosa
14 4.9,3,1.4,0.2,Iris-setosa
15 4.7,3.2,1.3,0.2,Iris-setosa
16 4.6,3.1,1.5,0.2,Iris-setosa''"/>
17 <param name="p" type="integer" value="0.80" label="Select % of data to training and testing the models"/>
18 </inputs>
19 <outputs>
20 <data format="csv" name="output1" label="dataset_summary.csv" />
21 <data format="csv" name="output2" label="accuracy_summary.csv" />
22 </outputs>
23 <tests>
24 <test>
25 <param name="test">
26 <element name="test-data">
27 <collection type="data">
28 <element format="csv" name="input" label="test-data/input.csv"/>
29 </collection>
30 </element>
31 </param>
32 <output format="csv" name="fit" label="test-data/dataset_summary.csv"/>
33 <output format="csv" name="fit" label="test-data/accuracy_summary.csv"/>
34 </test>
35 </tests>
36 <help>
37 Tool allow us to build 5 different models to predict e.g. species from flower measurements.
38 In the end we can select the best model for further analysis.
39
40 Let’s evaluate 5 different algorithms:
41
42 **Linear Discriminant Analysis (LDA)**
43 **Classification and Regression Trees (CART).**
44 **k-Nearest Neighbors (kNN).**
45 **Support Vector Machines (SVM) with a linear kernel.**
46 **Random Forest (RF)**
47
48 This is a good mixture of simple linear (LDA), nonlinear (CART, kNN) and complex nonlinear methods (SVM, RF).
49 We reset the random number seed before reach run to ensure that the evaluation of each algorithm is performed
50 using exactly the same data splits. It ensures the results are directly comparable.
51
52 </help>
53 <citations>
54 <citation>https://CRAN.R-project.org/package=caret</citation>
55 </citations>
56 </tool>