Galaxy | Tool Preview

Hyperparameter Search (version 1.0.11.0)
Search parameters Builders
Search parameters Builder 0
Advanced Options for SearchCVs
Advanced Options for SearchCV 0
For a non-deep learning model, save will output fitted best_estimator_ or a list of cv_results_ from each outer split if in nested CV mode. For a deep learning model, by checking the boolean option below the model input, the outputs are two parts, model skeleton and weights. Save Deep learning model for nested CV is not supported.

What it does Searches optimized parameter settings for an estimator or pipeline through either exhaustive grid cross validation search or Randomized cross validation search. please refer to Scikit-learn model_selection GridSearchCV, Scikit-learn model_selection RandomizedSearchCV and Tuning hyper-parameters.

Return

Outputs cv_results_ from SearchCV in a tabular dataset if no train_test_split, otherwise the test score(s). Besides, Output of the SearchCV object is optional.

How to choose search patameters grid?

Please refer to svm, linear_model, ensemble, naive_bayes, tree, neighbors and xgboost for estimator parameters. Refer to sklearn.preprocessing, feature_selection, decomposition, kernel_approximation, cluster.FeatureAgglomeration and skrebate for parameter in the pre-processing steps.

Search parameter list can be list, numpy array, or distribution. The evaluation of settings supports operations in Math, list comprehension, numpy.arange(np_arange), most numpy.random(e.g., np_random_uniform) and some scipy.stats(e.g., scipy_stats_zipf) classes or functions, and others.

Examples:

Estimator / Preprocessor search (additional `:` in the front):

: [sklearn_tree.DecisionTreeRegressor(), sklearn_tree.ExtraTreeRegressor()]

: [sklearn_feature_selection.SelectKBest(), sklearn_feature_selection.VarianceThreshold(),
   skrebate_ReliefF(), sklearn_preprocessing.RobustScaler()]

Hot number/keyword for preprocessors:

0   sklearn_preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True)
1   sklearn_preprocessing.Binarizer(copy=True, threshold=0.0)
2   sklearn_preprocessing.MaxAbsScaler(copy=True)
3   sklearn_preprocessing.Normalizer(copy=True, norm='l2')
4   sklearn_preprocessing.MinMaxScaler(copy=True, feature_range=(0, 1))
5   sklearn_preprocessing.PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)
6   sklearn_preprocessing.RobustScaler(copy=True, quantile_range=(25.0, 75.0), with_centering=True, with_scaling=True)
7   sklearn_feature_selection.SelectKBest(k=10, score_func=<function f_classif at 0x113806d90>)
8   sklearn_feature_selection.GenericUnivariateSelect(mode='percentile', param=1e-05, score_func=<function f_classif at 0x113806d90>)
9  sklearn_feature_selection.SelectPercentile(percentile=10, score_func=<function f_classif at 0x113806d90>)
10  sklearn_feature_selection.SelectFpr(alpha=0.05, score_func=<function f_classif at 0x113806d90>)
11  sklearn_feature_selection.SelectFdr(alpha=0.05, score_func=<function f_classif at 0x113806d90>)
12  sklearn_feature_selection.SelectFwe(alpha=0.05, score_func=<function f_classif at 0x113806d90>)
13  sklearn_feature_selection.VarianceThreshold(threshold=0.0)
14  sklearn_decomposition.FactorAnalysis(copy=True, iterated_power=3, max_iter=1000, n_components=None,
    noise_variance_init=None, random_state=0, svd_method='randomized', tol=0.01)
15  sklearn_decomposition.FastICA(algorithm='parallel', fun='logcosh', fun_args=None,
    max_iter=200, n_components=None, random_state=0, tol=0.0001, w_init=None, whiten=True)
16  sklearn_decomposition.IncrementalPCA(batch_size=None, copy=True, n_components=None, whiten=False)
17  sklearn_decomposition.KernelPCA(alpha=1.0, coef0=1, copy_X=True, degree=3, eigen_solver='auto',
    fit_inverse_transform=False, gamma=None, kernel='linear', kernel_params=None, max_iter=None,
    n_components=None, random_state=0, remove_zero_eig=False, tol=0)
18  sklearn_decomposition.LatentDirichletAllocation(batch_size=128, doc_topic_prior=None, evaluate_every=-1, learning_decay=0.7,
    learning_method=None, learning_offset=10.0, max_doc_update_iter=100, max_iter=10, mean_change_tol=0.001, n_components=10,
    n_topics=None, perp_tol=0.1, random_state=0, topic_word_prior=None, total_samples=1000000.0, verbose=0)
19  sklearn_decomposition.MiniBatchDictionaryLearning(alpha=1, batch_size=3, dict_init=None, fit_algorithm='lars',
    n_components=None, n_iter=1000, random_state=0, shuffle=True, split_sign=False, transform_algorithm='omp',
    transform_alpha=None, transform_n_nonzero_coefs=None, verbose=False)
20  sklearn_decomposition.MiniBatchSparsePCA(alpha=1, batch_size=3, callback=None, method='lars', n_components=None,
    n_iter=100, random_state=0, ridge_alpha=0.01, shuffle=True, verbose=False)
21  sklearn_decomposition.NMF(alpha=0.0, beta_loss='frobenius', init=None, l1_ratio=0.0, max_iter=200,
    n_components=None, random_state=0, shuffle=False, solver='cd', tol=0.0001, verbose=0)
22  sklearn_decomposition.PCA(copy=True, iterated_power='auto', n_components=None, random_state=0, svd_solver='auto', tol=0.0, whiten=False)
23  sklearn_decomposition.SparsePCA(U_init=None, V_init=None, alpha=1, max_iter=1000, method='lars',
    n_components=None, random_state=0, ridge_alpha=0.01, tol=1e-08, verbose=False)
24  sklearn_decomposition.TruncatedSVD(algorithm='randomized', n_components=2, n_iter=5, random_state=0, tol=0.0)
25  sklearn_kernel_approximation.Nystroem(coef0=None, degree=None, gamma=None, kernel='rbf',
    kernel_params=None, n_components=100, random_state=0)
26  sklearn_kernel_approximation.RBFSampler(gamma=1.0, n_components=100, random_state=0)
27  sklearn_kernel_approximation.AdditiveChi2Sampler(sample_interval=None, sample_steps=2)
28  sklearn_kernel_approximation.SkewedChi2Sampler(n_components=100, random_state=0, skewedness=1.0)
29  sklearn_cluster.FeatureAgglomeration(affinity='euclidean', compute_full_tree='auto', connectivity=None,
    linkage='ward', memory=None, n_clusters=2, pooling_func=<function mean at 0x113078ae8>)
30  skrebate_ReliefF(discrete_threshold=10, n_features_to_select=10, n_neighbors=100, verbose=False)
31  skrebate_SURF(discrete_threshold=10, n_features_to_select=10, verbose=False)
32  skrebate_SURFstar(discrete_threshold=10, n_features_to_select=10, verbose=False)
33  skrebate_MultiSURF(discrete_threshold=10, n_features_to_select=10, verbose=False)
34  skrebate_MultiSURFstar(discrete_threshold=10, n_features_to_select=10, verbose=False)
'sk_prep_all':   All sklearn preprocessing estimators, i.e., 0-6
'fs_all':        All feature_selection estimators, i.e., 7-13
'decomp_all':    All decomposition estimators, i.e., 14-24
'k_appr_all':    All kernel_approximation estimators, i.e., 25-28
'reb_all':       All skrebate estimators, i.e., 30-34
'all_0':         All except the imbalanced-learn samplers, i.e., 0-34
'imb_all':       All imbalanced-learn sampling methods, i.e., 35-53.
                 **CAUTION**: Mix of imblearn and other preprocessors may not work.
 None:           opt out of preprocessor

Support mix (CAUTION: Mix of imblearn and other preprocessors may not work), e.g.:

: [None, 'sk_prep_all', 21, 'k_appr_all', sklearn_feature_selection.SelectKBest(k=50)]

Whether to do train_test_split?

Please refer to https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation

https://scikit-learn.org/stable/_images/grid_search_cross_validation.png