AutoML capabilities

FEDOT is capable of setting its ‘automation rate’ by omitting some of its parameters. For example, if you just create the FEDOT instance and call the fit method with the appropriate dataset on it, you will have a full automation of the learning process, see automated composing

At the same time, if you pass some of the parameters, you will have a partial automation, see manual composing.

Other than that, you can utilize more AutoML means.

Data nature

FEDOT uses specific data processing according to the source of the input data (whether it is pandas DataFrame, or numpy array, or just a path to the dataset, etc).

Note

Be careful with datetime features, as they are to be casted into a float type with milliseconds unit.

Apart from that FEDOT is capable of working with multi-modal data. It means that you can pass it different types of datasets (tables, texts, images, etc) and it will get the information from within to work with it.

Dimensional operations

FEDOT supports bunch of dimensionality preprocessing operations that can be be added to the pipeline as a node.

Feature transformation operations definitions

API name

Definition

Problem

rfe_lin_reg

Linear Regression Recursive Feature Elimination

Feature extraction

rfe_non_lin_reg

Decision Tree Recursive Feature Elimination

Feature extraction

rfe_lin_class

Logistic Regression Recursive Feature Elimination

Feature extraction

rfe_non_lin_class

Decision Tree Recursive Feature Elimination

Feature extraction

isolation_forest_reg

Regression Isolation Forest

Regression anomaly detection

isolation_forest_class

Classification Isolation Forest

Classification anomaly detection

ransac_lin_reg

Regression Random Sample Consensus

Outlier detection

ransac_non_lin_reg

Decision Tree Random Sample Consensus

Outlier detection

pca

Principal Component Analysis

Dimensionality reduction

kernel_pca

Kernel Principal Component Analysis

Dimensionality reduction

fast_ica

Independent Component Analysis

Feature extraction

poly_features

Polynomial Features

Feature engineering

decompose

Diff regression prediction and target for new target

Feature extraction

class_decompose

Diff classification prediction and target for new target

Feature extraction

cntvect

Count Vectorizer

Text feature extraction

text_clean

Lemmatization and Stemming

Text data processing

tfidf

TF-IDF Vectorizer

Text feature extraction

word2vec_pretrained

Text vectorization

Text feature extraction

lagged

Time series to the Hankel matrix transformation

Time series transformation

sparse_lagged

As lagged but with sparsing

Time series transformation

smoothing

Moving average

Time series transformation

gaussian_filter

Gaussian Filter

Time series transformation

diff_filter

Derivative Filter

Time series transformation

cut

Cut timeseries

Timeseries transformation

scaling

Scaling

Feature scaling

normalization

Normalization

Feature normalization

simple_imputation

Imputation

Data imputation

one_hot_encoding

One-Hot Encoder

Feature encoding

label_encoding

Label Encoder

Feature encoding

resample

Imbalanced binary class transformation in classification

Data transformation

topological_features

Calculation of topological features

Time series transformation

Feature transformation operations implementations

API name

Model used

Presets

rfe_lin_reg

sklearn.feature_selection.RFE

rfe_non_lin_reg

sklearn.feature_selection.RFE

rfe_lin_class

sklearn.feature_selection.RFE

rfe_non_lin_class

sklearn.feature_selection.RFE

isolation_forest_reg

sklearn.ensemble.IsolationForest

isolation_forest_class

sklearn.ensemble.IsolationForest

ransac_lin_reg

sklearn.linear_model.RANSACRegressor

fast_train *tree

ransac_non_lin_reg

sklearn.linear_model.RANSACRegressor

*tree

pca

sklearn.decomposition.PCA

fast_train ts *tree

kernel_pca

sklearn.decomposition.KernelPCA

ts *tree

fast_ica

sklearn.decomposition.FastICA

ts *tree

poly_features

sklearn.preprocessing.PolynomialFeatures

decompose

FEDOT model

fast_train ts *tree

class_decompose

FEDOT model

fast_train *tree

cntvect

sklearn.feature_extraction.text.CountVectorizer

text_clean

nltk.stem.WordNetLemmatizer nltk.stem.SnowballStemmer

tfidf

sklearn.feature_extraction.text.TfidfVectorizer

word2vec_pretrained

Gensin-data model

lagged

FEDOT model

fast_train ts

sparse_lagged

FEDOT model

fast_train ts

smoothing

FEDOT model

fast_train ts

gaussian_filter

FEDOT model

fast_train ts

diff_filter

FEDOT model

ts

cut

FEDOT model

fast_train ts

scaling

sklearn.preprocessing.StandardScaler

fast_train ts *tree

normalization

sklearn.preprocessing.MinMaxScaler

fast_train ts *tree

simple_imputation

sklearn.impute.SimpleImputer

fast_train *tree

one_hot_encoding

sklearn.preprocessing.OneHotEncoder

label_encoding

sklearn.preprocessing.LabelEncoder

fast_train *tree

resample

FEDOT model using sklearn.utils.resample

topological_features

FEDOT model

ts

fast_topological_features

FEDOT model

ts

Models used

Using the parameter preset of the main API you can specify what models are available during the learning process.

It influences:

  • composing speed and quality

  • computational behaviour

  • task relevance

For example, 'best_quality' option allows FEDOT to use entire list of available models for a specified task. In contrast 'fast_train' ensures only fast learning models are going to be used.

Apart from that there are other options whose names speak for themselves: 'stable', 'auto', 'gpu', 'ts', 'automl' (the latter uses only AutoML models as pipeline nodes).

Note

To make it simple, FEDOT uses auto by default to identify the best choice for you.

Available models definitions

API name

Definition

Problem

adareg

AdaBoost regressor

Regression

catboostreg

Catboost regressor

Regression

dtreg

Decision Tree regressor

Regression

gbr

Gradient Boosting regressor

Regression

knnreg

K-nearest neighbors regressor

Regression

lasso

Lasso Linear regressor

Regression

lgbmreg

Light Gradient Boosting Machine regressor

Regression

linear

Linear Regression regressor

Regression

rfr

Random Forest regressor

Regression

ridge

Ridge Linear regressor

Regression

sgdr

Stochastic Gradient Descent regressor

Regression

svr

Linear Support Vector regressor

Regression

treg

Extra Trees regressor

Regression

xgbreg

Extreme Gradient Boosting regressor

Regression

bernb

Naive Bayes classifier (multivariate Bernoulli)

Classification

catboost

Catboost classifier

Classification

cnn

Convolutional Neural Network

Classification

dt

Decision Tree classifier

Classification

knn

K-nearest neighbors classifier

Classification

lda

Linear Discriminant Analysis

Classification

lgbm

Light Gradient Boosting Machine classifier

Classification

logit

Logistic Regression classifier

Classification

mlp

Multi-layer Perceptron classifier

Classification

multinb

Naive Bayes classifier (multinomial)

Classification

qda

Quadratic Discriminant Analysis

Classification

rf

Random Forest classifier

Classification

svc

Support Vector classifier

Classification

xgboost

Extreme Gradient Boosting classifier

Classification

kmeans

K-Means clustering

Clustering

ar

AutoRegression

Forecasting

arima

ARIMA

Forecasting

cgru

Convolutional Gated Recurrent Unit

Forecasting

ets

Exponential Smoothing

Forecasting

glm

Generalized Linear Models

Forecasting

locf

Last Observation Carried Forward

Forecasting

polyfit

Polynomial approximation

Forecasting

stl_arima

STL Decomposition with ARIMA

Forecasting

ts_naive_average

Naive Average

Forecasting

Available models implementations

API name

Model used

Presets

adareg

sklearn.ensemble.AdaBoostRegressor

fast_train ts *tree

catboostreg

catboost.CatBoostRegressor

*tree

dtreg

sklearn.tree.DecisionTreeRegressor

fast_train ts *tree

gbr

sklearn.ensemble.GradientBoostingRegressor

*tree

knnreg

sklearn.neighbors.KNeighborsRegressor

fast_train ts

lasso

sklearn.linear_model.Lasso

fast_train ts

lgbmreg

lightgbm.sklearn.LGBMRegressor

*tree

linear

sklearn.linear_model.LinearRegression

fast_train ts

rfr

sklearn.ensemble.RandomForestRegressor

fast_train *tree

ridge

sklearn.linear_model.Ridge

fast_train ts

sgdr

sklearn.linear_model.SGDRegressor

fast_train ts

svr

sklearn.svm.LinearSVR

treg

sklearn.ensemble.ExtraTreesRegressor

*tree

xgbreg

xgboost.XGBRegressor

*tree

bernb

sklearn.naive_bayes.BernoulliNB

fast_train

catboost

catboost.CatBoostClassifier

*tree

cnn

FEDOT model

dt

sklearn.tree.DecisionTreeClassifier

fast_train *tree

knn

sklearn.neighbors.KNeighborsClassifier

fast_train

lda

sklearn.discriminant_analysis.LinearDiscriminantAnalysis

fast_train

lgbm

lightgbm.sklearn.LGBMClassifier

logit

sklearn.linear_model.LogisticRegression

fast_train

mlp

sklearn.neural_network.MLPClassifier

multinb

sklearn.naive_bayes.MultinomialNB

fast_train

qda

sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis

fast_train

rf

sklearn.ensemble.RandomForestClassifier

fast_train *tree

svc

sklearn.svm.SVC

xgboost

xgboost.XGBClassifier

*tree

kmeans

sklearn.cluster.Kmeans

fast_train

ar

statsmodels.tsa.ar_model.AutoReg

fast_train ts

arima

statsmodels.tsa.arima.model.ARIMA

ts

cgru

FEDOT model

ts

ets

statsmodels.tsa.exponential_smoothing.ets.ETSModel

fast_train ts

glm

statsmodels.genmod.generalized_linear_model.GLM

fast_train ts

locf

FEDOT model

fast_train ts

polyfit

FEDOT model

fast_train ts

stl_arima

statsmodels.tsa.api.STLForecast

ts

ts_naive_average

FEDOT model

fast_train ts