Mercurial > repos > bgruening > sklearn_regression_metrics
annotate README.rst @ 5:8608ee21a49b draft
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit cd4a8b019168acd5a513c57a1b1f380622f230f6
author | bgruening |
---|---|
date | Sun, 01 Jul 2018 03:20:47 -0400 |
parents | e1a494495d9f |
children | 6fc4d26e35e0 |
rev | line source |
---|---|
0
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
1 *************** |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
2 Galaxy wrapper for scikit-learn library |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
3 *************** |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
4 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
5 Contents |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
6 ======== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
7 - `What is scikit-learn?`_ |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
8 - `Scikit-learn main package groups`_ |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
9 - `Tools offered by this wrapper`_ |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
10 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
11 - `Machine learning workflows`_ |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
12 - `Supervised learning workflows`_ |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
13 - `Unsupervised learning workflows`_ |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
14 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
15 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
16 ____________________________ |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
17 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
18 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
19 .. _What is scikit-learn? |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
20 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
21 What is scikit-learn? |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
22 =========================== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
23 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
24 Scikit-learn is an open-source machine learning library for the Python programming language. It offers various algorithms for performing supervised and unsupervised learning as well as data preprocessing and transformation, model selection and evaluation, and dataset utilities. It is built upon SciPy (Scientific Python) library. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
25 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
26 Scikit-learn source code can be accessed at https://github.com/scikit-learn/scikit-learn. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
27 Detailed installation instructions can be found at http://scikit-learn.org/stable/install.html |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
28 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
29 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
30 .. _Scikit-learn main package groups: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
31 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
32 ====== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
33 Scikit-learn main package groups |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
34 ====== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
35 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
36 Scikit-learn provides the users with several main groups of related operations. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
37 These are: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
38 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
39 - Classification |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
40 - Identifying to which category an object belongs. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
41 - Regression |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
42 - Predicting a continuous-valued attribute associated with an object. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
43 - Clustering |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
44 - Automatic grouping of similar objects into sets. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
45 - Preprocessing |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
46 - Feature extraction and normalization. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
47 - Model selection and evaluation |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
48 - Comparing, validating and choosing parameters and models. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
49 - Dimensionality reduction |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
50 - Reducing the number of random variables to consider. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
51 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
52 Each group consists of a number of well-known algorithms from the category. For example, one can find hierarchical, spectral, kmeans, and other clustering methods in sklearn.cluster package. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
53 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
54 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
55 .. _Tools offered by this wrapper: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
56 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
57 =================== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
58 Available tools in the current wrapper |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
59 =================== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
60 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
61 The current release of the wrapper offers a subset of the packages from scikit-learn library. You can find: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
62 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
63 - A subset of classification metric functions |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
64 - Linear and quadratic discriminant classifiers |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
65 - Random forest and Ada boost classifiers and regressors |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
66 - All the clustering methods |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
67 - All support vector machine classifiers |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
68 - A subset of data preprocessing estimator classes |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
69 - Pairwise metric measurement functions |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
70 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
71 In addition, several tools for performing matrix operations, generating problem-specific datasets, and encoding text and extracting features have been prepared to help the user with more advanced operations. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
72 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
73 .. _Machine learning workflows: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
74 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
75 Machine learning workflows |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
76 =============== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
77 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
78 Machine learning is about processes. No matter what machine learning algorithm we use, we can apply typical workflows and dataflows to produce more robust models and better predictions. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
79 Here we discuss supervised and unsupervised learning workflows. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
80 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
81 .. _Supervised learning workflows: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
82 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
83 =================== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
84 Supervised machine learning workflows |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
85 =================== |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
86 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
87 **What is supervised learning?** |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
88 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
89 In this machine learning task, given sample data which are labeled, the aim is to build a model which can predict the labels for new observations. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
90 In practice, there are five steps which we can go through to start from raw input data and end up getting reasonable predictions for new samples: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
91 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
92 1. Preprocess the data:: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
93 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
94 * Change the collected data into the proper format and datatype. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
95 * Adjust the data quality by filling the missing values, performing |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
96 required scaling and normalizations, etc. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
97 * Extract features which are the most meaningfull for the learning task. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
98 * Split the ready dataset into training and test samples. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
99 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
100 2. Choose an algorithm:: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
101 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
102 * These factors help one to choose a learning algorithm: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
103 - Nature of the data (e.g. linear vs. nonlinear data) |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
104 - Structure of the predicted output (e.g. binary vs. multilabel classification) |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
105 - Memory and time usage of the training |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
106 - Predictive accuracy on new data |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
107 - Interpretability of the predictions |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
108 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
109 3. Choose a validation method |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
110 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
111 Every machine learning model should be evaluated before being put into practicical use. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
112 There are numerous performance metrics to evaluate machine learning models. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
113 For supervised learning, usually classification or regression metrics are used. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
114 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
115 A validation method helps to evaluate the performance metrics of a trained model in order |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
116 to optimize its performance or ultimately switch to a more efficient model. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
117 Cross-validation is a known validation method. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
118 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
119 4. Fit a model |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
120 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
121 Given the learning algorithm, validation method, and performance metric(s) |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
122 repeat the following steps:: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
123 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
124 * Train the model. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
125 * Evaluate based on metrics. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
126 * Optimize unitl satisfied. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
127 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
128 5. Use fitted model for prediction:: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
129 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
130 This is a final evaluation in which, the optimized model is used to make predictions |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
131 on unseen (here test) samples. After this, the model is put into production. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
132 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
133 .. _Unsupervised learning workflows: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
134 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
135 ======================= |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
136 Unsupervised machine learning workflows |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
137 ======================= |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
138 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
139 **What is unsupervised learning?** |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
140 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
141 Unlike supervised learning and more liklely in real life, here the initial data is not labeled. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
142 The task is to extract the structure from the data and group the samples based on their similarities. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
143 Clustering and dimensionality reduction are two famous examples of unsupervised learning tasks. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
144 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
145 In this case, the workflow is as follows:: |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
146 |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
147 * Preprocess the data (without splitting to train and test). |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
148 * Train a model. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
149 * Evaluate and tune parameters. |
e1a494495d9f
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 2e1e78576b38110cf5b1f2ed83b08b9c3a6cbfee
bgruening
parents:
diff
changeset
|
150 * Analyse the model and test on real data. |