1.0.0.4
python
scikit-learn
pandas
xgboost
asteval
skrebate
imbalanced-learn
mlxtend
Train a model
Load a model and predict
Predict class labels
Include advanced options
squared loss
huber
epsilon insensitive
squared epsilon insensitive
l2
l1
elastic net
none
optimal
constant
inverse scaling
auto
svd
cholesky
lsqr
sparse_cg
sag
Gini impurity
Information gain
mse - mean squared error
mae - mean absolute error
auto - max_features=n_features
sqrt - max_features=sqrt(n_features)
log2 - max_features=log2(n_features)
I want to type the number in or input None type
auto
true
false
k-means++
random
Calculate metrics globally by counting the total true positives, false negatives and false positives. (micro)
Calculate metrics for each instance, and find their average. Only meaningful for multilabel. (samples)
Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. (macro)
Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. (weighted)
None
Select columns by column index number(s)
All columns BUT by column index number(s)
Select columns by column header name(s)
All columns BUT by column header name(s)
All columns
Tabular
Sparse
Tabular
Sparse
tabular data
sparse matrix
Uniform weights. All points in each neighborhood are weighted equally. (Uniform)
Weight points by the inverse of their distance. (Distance)
Auto
BallTree
KDTree
Brute-force
rbf
linear
poly
sigmoid
precomputed
arpack
lobpcg
amg
RBF
precomputed
Nearset neighbors
kmeans
discretize
auto
full
elkan
auto
ball_tree
kd_tree
brute
KMeans
Spectral Clustering
Mini Batch KMeans
DBSCAN
Birch
euclidean
cityblock
cosine
l1
l2
manhattan
braycurtis
canberra
chebyshev
correlation
dice
hamming
jaccard
kulsinski
mahalanobis
matching
minkowski
rogerstanimoto
russellrao
seuclidean
sokalmichener
sokalsneath
sqeuclidean
yule
rbf
sigmoid
polynomial
linear
chi2
additive_chi2
Euclidean distance matrix
Distance matrix
Minimum distances between one point and a set of points
Additive chi-squared kernel
Exponential chi-squared kernel
Linear kernel
L1 distances
Kernel
Polynomial kernel
Gaussian (rbf) kernel
Laplacian kernel
Standard Scaler (Standardizes features by removing the mean and scaling to unit variance)
Binarizer (Binarizes data)
Imputer (Completes missing values)
Max Abs Scaler (Scales features by their maximum absolute value)
Normalizer (Normalizes samples individually to unit norm)
Kernel Centerer (Centers a kernel matrix)
Minmax Scaler (Scales features to a range)
Polynomial Features (Generates polynomial and interaction features)
Robust Scaler (Scales features using outlier-invariance statistics)
Replace missing values using the mean along the axis
Replace missing values using the median along the axis
Replace missing using the most frequent value along the axis
default splitter
KFold
StratifiedKFold
LeaveOneOut
LeavePOut
RepeatedKFold
RepeatedStratifiedKFold
ShuffleSplit
StratifiedShuffleSplit
TimeSeriesSplit
PredefinedSplit
OrderedKFold
RepeatedOrderedKFold
GroupKFold
GroupShuffleSplit
LeaveOneGroupOut
LeavePGroupsOut
SelectKBest - Select features according to the k highest scores
GenericUnivariateSelect - Univariate feature selector with configurable strategy
SelectPercentile - Select features according to a percentile of the highest scores
SelectFpr - Filter: Select the p-values below alpha based on a FPR test
SelectFdr - Filter: Select the p-values for an estimated false discovery rate
SelectFwe - Filter: Select the p-values corresponding to Family-wise error rate
VarianceThreshold - Feature selector that removes all low-variance features
SelectFromModel - Meta-transformer for selecting features based on importance weights
RFE - Feature ranking with recursive feature elimination
RFECV - Feature ranking with recursive feature elimination and cross-validated selection of the best number of features
percentile
k_best
fpr
fdr
fwe
Yes
No. Load a prefitted estimator
Yes
DyRFECV - Extended RFECV with changeable steps
chi2 - Compute chi-squared stats between each non-negative feature and class
f_classif - Compute the ANOVA F-value for the provided sample
f_regression - Univariate linear regression tests
mutual_info_classif - Estimate mutual information for a discrete target variable
mutual_info_regression - Estimate mutual information for a continuous target variable
default with estimator
Classification -- 'accuracy'
Classification -- 'balanced_accuracy'
Classification -- 'average_precision'
Classification -- 'f1'
Classification -- 'f1_micro'
Classification -- 'f1_macro'
Classification -- 'f1_weighted'
Classification -- 'f1_samples'
Classification -- 'neg_log_loss'
Classification -- 'precision'
Classification -- 'precision_micro'
Classification -- 'precision_macro'
Classification -- 'precision_wighted'
Classification -- 'precision_samples'
Classification -- 'recall'
Classification -- 'recall_micro'
Classification -- 'recall_macro'
Classification -- 'recall_wighted'
Classification -- 'recall_samples'
Classification -- 'roc_auc'
Regression -- 'explained_variance'
Regression -- 'neg_mean_absolute_error'
Regression -- 'neg_mean_squared_error'
Regression -- 'neg_mean_squared_log_error'
Regression -- 'neg_median_absolute_error'
Regression -- 'r2'
anomaly detection -- binarize_auc_scorer
anomaly detection -- binarize_average_precision_scorer
Classification -- 'accuracy'
Classification -- 'balanced_accuracy'
Classification -- 'average_precision'
Classification -- 'f1'
Classification -- 'f1_micro'
Classification -- 'f1_macro'
Classification -- 'f1_weighted'
Classification -- 'f1_samples'
Classification -- 'neg_log_loss'
Classification -- 'precision'
Classification -- 'precision_micro'
Classification -- 'precision_macro'
Classification -- 'precision_wighted'
Classification -- 'precision_samples'
Classification -- 'recall'
Classification -- 'recall_micro'
Classification -- 'recall_macro'
Classification -- 'recall_wighted'
Classification -- 'recall_samples'
Classification -- 'roc_auc'
Regression -- 'explained_variance'
Regression -- 'neg_mean_absolute_error'
Regression -- 'neg_mean_squared_error'
Regression -- 'neg_mean_squared_log_error'
Regression -- 'neg_median_absolute_error'
Regression -- 'r2'
anomaly detection -- binarize_auc_scorer
anomaly detection -- binarize_average_precision_scorer
sklearn.svm
sklearn.linear_model
sklearn.ensemble
sklearn.naive_bayes
sklearn.tree
sklearn.neighbors
xgboost
LinearSVC
LinearSVR
NuSVC
NuSVR
OneClassSVM
SVC
SVR
ARDRegression
BayesianRidge
ElasticNet
ElasticNetCV
HuberRegressor
Lars
LarsCV
Lasso
LassoCV
LassoLars
LassoLarsCV
LassoLarsIC
LinearRegression
LogisticRegression
LogisticRegressionCV
MultiTaskLasso
MultiTaskElasticNet
MultiTaskLassoCV
MultiTaskElasticNetCV
OrthogonalMatchingPursuit
OrthogonalMatchingPursuitCV
PassiveAggressiveClassifier
PassiveAggressiveRegressor
Perceptron
RANSACRegressor
Ridge
RidgeClassifier
RidgeClassifierCV
RidgeCV
SGDClassifier
SGDRegressor
TheilSenRegressor
AdaBoostClassifier
AdaBoostRegressor
BaggingClassifier
BaggingRegressor
ExtraTreesClassifier
ExtraTreesRegressor
GradientBoostingClassifier
GradientBoostingRegressor
IsolationForest
RandomForestClassifier
RandomForestRegressor
RandomTreesEmbedding
BernoulliNB
GaussianNB
MultinomialNB
DecisionTreeClassifier
DecisionTreeRegressor
ExtraTreeClassifier
ExtraTreeRegressor
KNeighborsClassifier
KNeighborsRegressor
KernelDensity
LocalOutlierFactor
RadiusNeighborsClassifier
RadiusNeighborsRegressor
NearestCentroid
NearestNeighbors
XGBRegressor
XGBClassifier
Load a custom estimator
Nystroem
RBFSampler
AdditiveChi2Sampler
SkewedChi2Sampler
DictionaryLearning
FactorAnalysis
FastICA
IncrementalPCA
KernelPCA
LatentDirichletAllocation
MiniBatchDictionaryLearning
MiniBatchSparsePCA
NMF
PCA
SparsePCA
TruncatedSVD
FeatureAgglomeration
ReliefF
SURF
SURFstar
MultiSURF
MultiSURFstar
under_sampling.ClusterCentroids
under_sampling.CondensedNearestNeighbour
under_sampling.EditedNearestNeighbours
under_sampling.RepeatedEditedNearestNeighbours
under_sampling.AllKNN
under_sampling.InstanceHardnessThreshold
under_sampling.NearMiss
under_sampling.NeighbourhoodCleaningRule
under_sampling.OneSidedSelection
under_sampling.RandomUnderSampler
under_sampling.TomekLinks
over_sampling.ADASYN
over_sampling.RandomOverSampler
over_sampling.SMOTE
over_sampling.SVMSMOTE
over_sampling.BorderlineSMOTE
over_sampling.SMOTENC
combine.SMOTEENN
combine.SMOTETomek
Z_RandomOverSampler - for regression
Load a custom estimator
selected_tasks['selected_task'] == 'load'
selected_tasks['selected_task'] == 'train'
10.5281/zenodo.15094
@article{scikit-learn,
title={Scikit-learn: Machine Learning in {P}ython},
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
journal={Journal of Machine Learning Research},
volume={12},
pages={2825--2830},
year={2011}
}
@Misc{,
author = {Eric Jones and Travis Oliphant and Pearu Peterson and others},
title = {{SciPy}: Open source scientific tools for {Python}},
year = {2001--},
url = "http://www.scipy.org/",
note = {[Online; accessed 2016-04-09]}
}
@article{DBLP:journals/corr/abs-1711-08477,
author = {Ryan J. Urbanowicz and
Randal S. Olson and
Peter Schmitt and
Melissa Meeker and
Jason H. Moore},
title = {Benchmarking Relief-Based Feature Selection Methods},
journal = {CoRR},
volume = {abs/1711.08477},
year = {2017},
url = {http://arxiv.org/abs/1711.08477},
archivePrefix = {arXiv},
eprint = {1711.08477},
timestamp = {Mon, 13 Aug 2018 16:46:04 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1711-08477},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Chen:2016:XST:2939672.2939785,
author = {Chen, Tianqi and Guestrin, Carlos},
title = {{XGBoost}: A Scalable Tree Boosting System},
booktitle = {Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
series = {KDD '16},
year = {2016},
isbn = {978-1-4503-4232-2},
location = {San Francisco, California, USA},
pages = {785--794},
numpages = {10},
url = {http://doi.acm.org/10.1145/2939672.2939785},
doi = {10.1145/2939672.2939785},
acmid = {2939785},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {large-scale machine learning},
}
@article{JMLR:v18:16-365,
author = {Guillaume Lema{{\^i}}tre and Fernando Nogueira and Christos K. Aridas},
title = {Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning},
journal = {Journal of Machine Learning Research},
year = {2017},
volume = {18},
number = {17},
pages = {1-5},
url = {http://jmlr.org/papers/v18/16-365.html}
}