Builder for FEDOT API

class fedot.api.builder.FedotBuilder(problem)[source]

Bases: object

An alternative FEDOT API version with optional attributes being documented and separated into groups by meaning. Each of the groups has corresponding setter method, named starting with setup_*. Use these methods to set corresponding API attributes:

After all demanded attributes are set, use build() to get a parametrized instance of Fedot.

Examples

Example 1:

from fedot import FedotBuilder
from fedot.core.utils import fedot_project_root


if __name__ == '__main__':
    train_data_path = f'{fedot_project_root()}/examples/real_cases/data/scoring/scoring_train.csv'
    test_data_path = f'{fedot_project_root()}/examples/real_cases/data/scoring/scoring_test.csv'

    fedot = (FedotBuilder(problem='classification')
             .setup_composition(timeout=10, with_tuning=True, preset='best_quality')
             .setup_pipeline_evaluation(max_pipeline_fit_time=5, metric=['roc_auc', 'precision'])
             .build())
    fedot.fit(features=train_data_path, target='target')
    fedot.predict_proba(features=test_data_path)
    fedot.plot_prediction()

Example 2:

from fedot import FedotBuilder
from fedot.core.utils import fedot_project_root


if __name__ == '__main__':
    SEED = 42

    builder = (FedotBuilder('ts_forecasting')
               .setup_composition(preset='fast_train', timeout=0.5, with_tuning=True, seed=SEED)
               .setup_evolution(num_of_generations=3)
               .setup_pipeline_evaluation(metric='mae'))

    datasets_path = fedot_project_root() / 'examples/data/ts'
    resulting_models = {}
    for data_path in datasets_path.iterdir():
        if data_path.name == 'ts_sea_level.csv':
            continue
        fedot = builder.build()
        fedot.fit(data_path, target='value')
        fedot.predict(features=fedot.train_data, validation_blocks=2)
        fedot.plot_prediction()
        fedot.current_pipeline.show()
        resulting_models[data_path.stem] = fedot
Parameters

problem (str) –

name of a modelling problem to solve.

Possible options:
  • classification -> for classification task

  • regression -> for regression task

  • ts_forecasting -> for time series forecasting task

setup_composition(timeout=<default value>, task_params=<default value>, seed=<default value>, preset=<default value>, with_tuning=<default value>, use_meta_rules=<default value>)[source]

Sets general AutoML parameters.

Parameters
  • timeout (Optional[float]) – time for model design (in minutes): None or -1 means infinite time.

  • task_params (TaskParams) – additional parameters of a task.

  • seed (int) – value for a fixed random seed.

  • preset (str) –

    name of the preset for model building (e.g. best_quality, fast_train, gpu). Default value is auto.

    Possible options:
    • best_quality -> All models that are available for this data type and task are used

    • fast_train -> Models that learn quickly. This includes preprocessing operations (data operations) that only reduce the dimensionality of the data, but cannot increase it. For example, there are no polynomial features and one-hot encoding operations

    • stable -> The most reliable preset in which the most stable operations are included

    • auto -> Automatically determine which preset should be used

    • gpu -> Models that use GPU resources for computation

    • ts -> A special preset with models for time series forecasting task

    • automl -> A special preset with only AutoML libraries such as TPOT and H2O as operations

  • with_tuning (bool) – flag for tuning hyperparameters of the final evolved Pipeline. Defaults to True.

  • use_meta_rules (bool) – indicates whether to change set parameters according to FEDOT meta rules.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_parallelization(n_jobs=<default value>, parallelization_mode=<default value>)[source]

Sets parameters of computational parallelization by CPU jobs.

Parameters
  • n_jobs (int) – num of jobs for parallelization (set to -1 to use all cpu’s). Defaults to -1.

  • parallelization_mode (str) – type of evaluation for groups of individuals (populational or sequential). Default value is populational.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_output(logging_level=<default value>, show_progress=<default value>, keep_history=<default value>, history_dir=<default value>, cache_dir=<default value>)[source]

Sets parameters of outputs: logging, cache directories, etc.

Parameters
  • logging_level (int) –

    logging levels are the same as in built-in logging library.

    Possible options:
    • 50 -> critical

    • 40 -> error

    • 30 -> warning

    • 20 -> info

    • 10 -> debug

    • 0 -> nonset

  • show_progress (bool) – indicates whether to show progress using tqdm/tuner or not. Defaults to True.

  • keep_history (bool) – indicates if the framework should track evolutionary optimization history for possible further analysis. Defaults to True.

  • history_dir (Optional[str]) – relative or absolute path of a folder for composing history. By default, creates a folder named “FEDOT” in temporary system files of an OS. A relative path is relative to the default value.

  • cache_dir (Optional[str]) – path to a directory containing cache files (if any cache is enabled). By default, creates a folder named “FEDOT” in temporary system files of an OS.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_evolution(initial_assumption=<default value>, num_of_generations=<default value>, early_stopping_iterations=<default value>, early_stopping_timeout=<default value>, pop_size=<default value>, keep_n_best=<default value>, genetic_scheme=<default value>, use_pipelines_cache=<default value>, optimizer=<default value>)[source]

Sets parameters of ML pipelines evolutionary optimization.

Parameters
  • initial_assumption (Union[fedot.core.pipelines.pipeline.Pipeline, List[fedot.core.pipelines.pipeline.Pipeline]]) – initial assumption(s) for composer. Can be either a single Pipeline or sequence of ones. Default values are task-specific and selected by the method for_task().

  • early_stopping_iterations (int) – composer will stop after n generation without improving

  • num_of_generations (Optional[int]) – number of evolutionary generations for composer. Defaults to None - no limit.

  • early_stopping_iterations – composer will stop after n generation without improving.

  • early_stopping_timeout (int) – stagnation timeout in minutes: composer will stop after n minutes without improving. Defaults to 10.

  • pop_size (int) – size of population (generation) during composing. Defaults to 20.

  • keep_n_best (int) – number of the best individuals in generation that survive during the evolution. Defaults to 1.

  • genetic_scheme (str) – name of the genetic scheme. Defaults to steady_state.

  • use_pipelines_cache (bool) – indicates whether to use pipeline structures caching. Defaults to True.

  • optimizer (Optional[Type[GraphOptimizer]]) – inherit from golem.core.optimisers.optimizer.GraphOptimizer to specify a custom optimizer. Default optimizer is golem.core.optimisers.genetic.gp_optimizer.EvoGraphOptimizer. See the example.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_pipeline_structure(available_operations=<default value>, max_depth=<default value>, max_arity=<default value>)[source]

Sets constrains on ML pipeline structure.

Parameters
  • available_operations (List[str]) –

    list of model names to use. Pick the names according to operations repository.

    Possible options:
    • adareg -> AdaBoost Regressor

    • ar -> AutoRegression

    • arima -> ARIMA

    • cgru -> Convolutional Gated Recurrent Unit

    • bernb -> Naive Bayes Classifier (multivariate Bernoulli)

    • catboost -> Catboost Classifier

    • catboostreg -> Catboost Regressor

    • dt -> Decision Tree Classifier

    • dtreg -> Decision Tree Regressor

    • gbr -> Gradient Boosting Regressor

    • kmeans -> K-Means clustering

    • knn -> K-nearest neighbors Classifier

    • knnreg -> K-nearest neighbors Regressor

    • lasso -> Lasso Linear Regressor

    • lda -> Linear Discriminant Analysis

    • lgbm -> Light Gradient Boosting Machine Classifier

    • lgbmreg -> Light Gradient Boosting Machine Regressor

    • linear -> Linear Regression Regressor

    • logit -> Logistic Regression Classifier

    • mlp -> Multi-layer Perceptron Classifier

    • multinb -> Naive Bayes Classifier (multinomial)

    • qda -> Quadratic Discriminant Analysis

    • rf -> Random Forest Classifier

    • rfr -> Random Forest Regressor

    • ridge -> Ridge Linear Regressor

    • polyfit -> Polynomial fitter

    • sgdr -> Stochastic Gradient Descent Regressor

    • stl_arima -> STL Decomposition with ARIMA

    • glm -> Generalized Linear Models

    • ets -> Exponential Smoothing

    • locf -> Last Observation Carried Forward

    • ts_naive_average -> Naive Average

    • svc -> Support Vector Classifier

    • svr -> Linear Support Vector Regressor

    • treg -> Extra Trees Regressor

    • xgboost -> Extreme Gradient Boosting Classifier

    • xgbreg -> Extreme Gradient Boosting Regressor

    • cnn -> Convolutional Neural Network

    • scaling -> Scaling

    • normalization -> Normalization

    • simple_imputation -> Imputation

    • pca -> Principal Component Analysis

    • kernel_pca -> Kernel Principal Component Analysis

    • fast_ica -> Independent Component Analysis

    • poly_features -> Polynomial Features

    • one_hot_encoding -> One-Hot Encoder

    • label_encoding -> Label Encoder

    • rfe_lin_reg -> Linear Regression Recursive Feature Elimination

    • rfe_non_lin_reg -> Decision Tree Recursive Feature Elimination

    • rfe_lin_class -> Logistic Regression Recursive Feature Elimination

    • rfe_non_lin_class -> Decision Tree Recursive Feature Elimination

    • isolation_forest_reg -> Regression Isolation Forest

    • isolation_forest_class -> Classification Isolation Forest

    • decompose -> Regression Decomposition

    • class_decompose -> Classification Decomposition

    • resample -> Resample features

    • ransac_lin_reg -> Regression Random Sample Consensus

    • ransac_non_lin_reg -> Decision Tree Random Sample Consensus

    • cntvect -> Count Vectorizer

    • text_clean -> Lemmatization and Stemming

    • tfidf -> TF-IDF Vectorizer

    • word2vec_pretrained -> Word2Vec

    • lagged -> Lagged Transformation

    • sparse_lagged -> Sparse Lagged Transformation

    • smoothing -> Smoothing Transformation

    • gaussian_filter -> Gaussian Filter Transformation

    • diff_filter -> Derivative Filter Transformation

    • cut -> Cut Transformation

    • exog_ts -> Exogeneus Transformation

    • topological_features -> Topological features

  • max_depth (int) – max depth of a pipeline. Defaults to 6.

  • max_arity (int) – max arity of a pipeline nodes. Defaults to 3.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_pipeline_evaluation(metric=<default value>, cv_folds=<default value>, max_pipeline_fit_time=<default value>, collect_intermediate_metric=<default value>)[source]

Sets parameters of ML pipelines quality evaluation.

Parameters
  • metric (Union[str, MetricsEnum, QualityMetricCallable, ComplexityMetricCallable, Sequence[Union[str, MetricsEnum, QualityMetricCallable, ComplexityMetricCallable]]]) –

    metric for quality calculation during composing, also is used for tuning if with_tuning=True.

    Default value depends on a given task:
    • roc_auc_pen -> for classification

    • rmse -> for regression & time series forecasting

    Available metrics are listed in the following enumerations:

  • cv_folds (int) –

    number of folds for cross-validation.

    Default value depends on a given problem:
    • 5 -> for classification and regression tasks

    • 3 -> for time series forecasting task

  • max_pipeline_fit_time (Optional[int]) – time constraint for operation fitting (in minutes). Once the limit is reached, a candidate pipeline will be dropped. Defaults to None - no limit.

  • collect_intermediate_metric (bool) – save metrics for intermediate (non-root) nodes in composed Pipeline.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_data_preprocessing(safe_mode=<default value>, use_input_preprocessing=<default value>, use_preprocessing_cache=<default value>, use_auto_preprocessing=<default value>)[source]

Sets parameters of input data preprocessing.

Parameters
  • safe_mode (bool) – if set True it will cut large datasets to prevent memory overflow and use label encoder instead of one-hot encoder if summary cardinality of categorical features is high. Default value is False.

  • use_input_preprocessing (bool) – indicates whether to do preprocessing of further given data. Defaults to True.

  • use_preprocessing_cache (bool) – bool indicating whether to use optional preprocessors caching. Defaults to True.

  • use_auto_preprocessing (bool) –

Returns

FedotBuilder instance.

Return type

FedotBuilder

build()[source]

Initializes an instance of Fedot with accumulated parameters.

Returns

Fedot instance.

Return type

Fedot