Galaxy |

What it does Searches optimized parameter settings for an estimator or pipeline through either exhaustive grid cross validation search or Randomized cross validation search. please refer to Scikit-learn model_selection GridSearchCV, Scikit-learn model_selection RandomizedSearchCV and Tuning hyper-parameters.

Return

Outputs cv_results_ from SearchCV in a tabular dataset if no train_test_split, otherwise the test score(s). Besides, Output of the SearchCV object is optional.

How to choose search patameters grid?

Please refer to svm, linear_model, ensemble, naive_bayes, tree, neighbors and xgboost for estimator parameters. Refer to sklearn.preprocessing, feature_selection, decomposition, kernel_approximation, cluster.FeatureAgglomeration and skrebate for parameter in the pre-processing steps.

Search parameter list can be list, numpy array, or distribution. The evaluation of settings supports operations in Math, list comprehension, numpy.arange(np_arange), most numpy.random(e.g., np_random_uniform) and some scipy.stats(e.g., scipy_stats_zipf) classes or functions, and others.

Examples:

[3, 5, 7, 9]
list(range(50, 1001, 50))
np_arange(0.01, 1, 0.1)
np_random_choice(list(range(1, 51)) + [None], size=20)
scipy_stats_randint(1, 11)

Estimator / Preprocessor search (additional `:` in the front):

: [sklearn_tree.DecisionTreeRegressor(), sklearn_tree.ExtraTreeRegressor()]

: [sklearn_feature_selection.SelectKBest(), sklearn_feature_selection.VarianceThreshold(),
   skrebate_ReliefF(), sklearn_preprocessing.RobustScaler()]

Hot number/keyword for preprocessors:

0   sklearn_preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True)
1   sklearn_preprocessing.Binarizer(copy=True, threshold=0.0)
2   sklearn_preprocessing.MaxAbsScaler(copy=True)
3   sklearn_preprocessing.Normalizer(copy=True, norm='l2')
4   sklearn_preprocessing.MinMaxScaler(copy=True, feature_range=(0, 1))
5   sklearn_preprocessing.PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)
6   sklearn_preprocessing.RobustScaler(copy=True, quantile_range=(25.0, 75.0), with_centering=True, with_scaling=True)
7   sklearn_feature_selection.SelectKBest(k=10, score_func=<function f_classif at 0x113806d90>)
8   sklearn_feature_selection.GenericUnivariateSelect(mode='percentile', param=1e-05, score_func=<function f_classif at 0x113806d90>)
9  sklearn_feature_selection.SelectPercentile(percentile=10, score_func=<function f_classif at 0x113806d90>)
10  sklearn_feature_selection.SelectFpr(alpha=0.05, score_func=<function f_classif at 0x113806d90>)
11  sklearn_feature_selection.SelectFdr(alpha=0.05, score_func=<function f_classif at 0x113806d90>)
12  sklearn_feature_selection.SelectFwe(alpha=0.05, score_func=<function f_classif at 0x113806d90>)
13  sklearn_feature_selection.VarianceThreshold(threshold=0.0)
14  sklearn_decomposition.FactorAnalysis(copy=True, iterated_power=3, max_iter=1000, n_components=None,
    noise_variance_init=None, random_state=0, svd_method='randomized', tol=0.01)
15  sklearn_decomposition.FastICA(algorithm='parallel', fun='logcosh', fun_args=None,
    max_iter=200, n_components=None, random_state=0, tol=0.0001, w_init=None, whiten=True)
16  sklearn_decomposition.IncrementalPCA(batch_size=None, copy=True, n_components=None, whiten=False)
17  sklearn_decomposition.KernelPCA(alpha=1.0, coef0=1, copy_X=True, degree=3, eigen_solver='auto',
    fit_inverse_transform=False, gamma=None, kernel='linear', kernel_params=None, max_iter=None,
    n_components=None, random_state=0, remove_zero_eig=False, tol=0)
18  sklearn_decomposition.LatentDirichletAllocation(batch_size=128, doc_topic_prior=None, evaluate_every=-1, learning_decay=0.7,
    learning_method=None, learning_offset=10.0, max_doc_update_iter=100, max_iter=10, mean_change_tol=0.001, n_components=10,
    n_topics=None, perp_tol=0.1, random_state=0, topic_word_prior=None, total_samples=1000000.0, verbose=0)
19  sklearn_decomposition.MiniBatchDictionaryLearning(alpha=1, batch_size=3, dict_init=None, fit_algorithm='lars',
    n_components=None, n_iter=1000, random_state=0, shuffle=True, split_sign=False, transform_algorithm='omp',
    transform_alpha=None, transform_n_nonzero_coefs=None, verbose=False)
20  sklearn_decomposition.MiniBatchSparsePCA(alpha=1, batch_size=3, callback=None, method='lars', n_components=None,
    n_iter=100, random_state=0, ridge_alpha=0.01, shuffle=True, verbose=False)
21  sklearn_decomposition.NMF(alpha=0.0, beta_loss='frobenius', init=None, l1_ratio=0.0, max_iter=200,
    n_components=None, random_state=0, shuffle=False, solver='cd', tol=0.0001, verbose=0)
22  sklearn_decomposition.PCA(copy=True, iterated_power='auto', n_components=None, random_state=0, svd_solver='auto', tol=0.0, whiten=False)
23  sklearn_decomposition.SparsePCA(U_init=None, V_init=None, alpha=1, max_iter=1000, method='lars',
    n_components=None, random_state=0, ridge_alpha=0.01, tol=1e-08, verbose=False)
24  sklearn_decomposition.TruncatedSVD(algorithm='randomized', n_components=2, n_iter=5, random_state=0, tol=0.0)
25  sklearn_kernel_approximation.Nystroem(coef0=None, degree=None, gamma=None, kernel='rbf',
    kernel_params=None, n_components=100, random_state=0)
26  sklearn_kernel_approximation.RBFSampler(gamma=1.0, n_components=100, random_state=0)
27  sklearn_kernel_approximation.AdditiveChi2Sampler(sample_interval=None, sample_steps=2)
28  sklearn_kernel_approximation.SkewedChi2Sampler(n_components=100, random_state=0, skewedness=1.0)
29  sklearn_cluster.FeatureAgglomeration(affinity='euclidean', compute_full_tree='auto', connectivity=None,
    linkage='ward', memory=None, n_clusters=2, pooling_func=<function mean at 0x113078ae8>)
30  skrebate_ReliefF(discrete_threshold=10, n_features_to_select=10, n_neighbors=100, verbose=False)
31  skrebate_SURF(discrete_threshold=10, n_features_to_select=10, verbose=False)
32  skrebate_SURFstar(discrete_threshold=10, n_features_to_select=10, verbose=False)
33  skrebate_MultiSURF(discrete_threshold=10, n_features_to_select=10, verbose=False)
34  skrebate_MultiSURFstar(discrete_threshold=10, n_features_to_select=10, verbose=False)
'sk_prep_all':   All sklearn preprocessing estimators, i.e., 0-6
'fs_all':        All feature_selection estimators, i.e., 7-13
'decomp_all':    All decomposition estimators, i.e., 14-24
'k_appr_all':    All kernel_approximation estimators, i.e., 25-28
'reb_all':       All skrebate estimators, i.e., 30-34
'all_0':         All except the imbalanced-learn samplers, i.e., 0-34
'imb_all':       All imbalanced-learn sampling methods, i.e., 35-53.
                 **CAUTION**: Mix of imblearn and other preprocessors may not work.
 None:           opt out of preprocessor

Support mix (CAUTION: Mix of imblearn and other preprocessors may not work), e.g.:

: [None, 'sk_prep_all', 21, 'k_appr_all', sklearn_feature_selection.SelectKBest(k=50)]

Whether to do train_test_split?

Please refer to https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation

https://scikit-learn.org/stable/_images/grid_search_cross_validation.png