annotate accuracy.xml @ 2:6169ba9ed42a draft

Uploaded
author testtool
date Fri, 13 Oct 2017 10:10:32 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
1 <tool id="accuracy" name="accuracy" version="1.0.0">
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
2 <description>model creation and accuracy estimation</description>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
3 <requirements>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
4 <requirement type="package" version="6.0_76">r-caret</requirement>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
5 </requirements>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
6 <command detect_errors="aggressive">
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
7 Rscript '$__tool_directory__/accuracy.R' '$input' '$p' '$output1' '$output2'
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
8 </command>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
9 <inputs>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
10 <param format="csv" type="data" name="input" value="" label="Input dataset" help="
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
11 e.g. iris species table
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
12 Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
13 5.1,3.5,1.4,0.2,Iris-setosa
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
14 4.9,3,1.4,0.2,Iris-setosa
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
15 4.7,3.2,1.3,0.2,Iris-setosa
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
16 4.6,3.1,1.5,0.2,Iris-setosa''"/>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
17 <param name="p" type="integer" value="0.80" label="Select % of data to training and testing the models"/>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
18 </inputs>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
19 <outputs>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
20 <data format="csv" name="output1" label="dataset_summary.csv" />
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
21 <data format="csv" name="output2" label="accuracy_summary.csv" />
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
22 </outputs>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
23 <tests>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
24 <test>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
25 <param name="test">
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
26 <element name="test-data">
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
27 <collection type="data">
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
28 <element format="csv" name="input" label="test-data/input.csv"/>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
29 </collection>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
30 </element>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
31 </param>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
32 <output format="csv" name="fit" label="test-data/dataset_summary.csv"/>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
33 <output format="csv" name="fit" label="test-data/accuracy_summary.csv"/>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
34 </test>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
35 </tests>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
36 <help>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
37 Tool allow us to build 5 different models to predict e.g. species from flower measurements.
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
38 In the end we can select the best model for further analysis.
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
39
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
40 Let’s evaluate 5 different algorithms:
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
41
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
42 **Linear Discriminant Analysis (LDA)**
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
43 **Classification and Regression Trees (CART).**
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
44 **k-Nearest Neighbors (kNN).**
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
45 **Support Vector Machines (SVM) with a linear kernel.**
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
46 **Random Forest (RF)**
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
47
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
48 This is a good mixture of simple linear (LDA), nonlinear (CART, kNN) and complex nonlinear methods (SVM, RF).
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
49 We reset the random number seed before reach run to ensure that the evaluation of each algorithm is performed
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
50 using exactly the same data splits. It ensures the results are directly comparable.
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
51
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
52 </help>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
53 <citations>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
54 <citation>https://CRAN.R-project.org/package=caret</citation>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
55 </citations>
6169ba9ed42a Uploaded
testtool
parents:
diff changeset
56 </tool>