Builder for FEDOT API

class fedot.api.builder.FedotBuilder(problem)[source]

Bases: object

An alternative FEDOT API version with optional attributes being documented and separated into groups by meaning. Each of the groups has corresponding setter method, named starting with setup_*. Use these methods to set corresponding API attributes:

setup_composition() -> general AutoML parameters
setup_parallelization() -> parameters of computational parallelization by CPU jobs
setup_output() -> parameters of outputs: logging, cache directories, etc.
setup_evolution() -> parameters of ML pipelines evolutionary optimization
setup_pipeline_structure() -> constrains on ML pipeline structure
setup_pipeline_evaluation() -> parameters of ML pipelines quality evaluation
setup_data_preprocessing() -> parameters of input data preprocessing

After all demanded attributes are set, use build() to get a parametrized instance of Fedot.

Examples

Example 1:

from fedot import FedotBuilder
from fedot.core.utils import fedot_project_root


if __name__ == '__main__':
    train_data_path = f'{fedot_project_root()}/cases/data/scoring/scoring_train.csv'
    test_data_path = f'{fedot_project_root()}/cases/data/scoring/scoring_test.csv'

    fedot = (FedotBuilder(problem='classification')
             .setup_composition(timeout=10, with_tuning=True, preset='best_quality')
             .setup_pipeline_evaluation(max_pipeline_fit_time=5, metric=['roc_auc', 'precision'])
             .build())
    fedot.fit(features=train_data_path, target='target')
    fedot.predict_proba(features=test_data_path)
    fedot.plot_prediction()

Example 2:

from fedot import FedotBuilder
from fedot.core.utils import fedot_project_root


if __name__ == '__main__':
    SEED = 42

    builder = (FedotBuilder('ts_forecasting')
               .setup_composition(preset='fast_train', timeout=0.5, with_tuning=True, seed=SEED)
               .setup_evolution(num_of_generations=3)
               .setup_pipeline_evaluation(metric='mae'))

    datasets_path = fedot_project_root() / 'examples/data/ts'
    resulting_models = {}
    for data_path in datasets_path.iterdir():
        if data_path.name == 'ts_sea_level.csv':
            continue
        fedot = builder.build()
        fedot.fit(data_path, target='value')
        fedot.predict(features=fedot.train_data, validation_blocks=2)
        fedot.plot_prediction()
        fedot.current_pipeline.show()
        resulting_models[data_path.stem] = fedot

Parameters

problem (str) –

name of a modelling problem to solve.

Possible options:

classification -> for classification task
regression -> for regression task
ts_forecasting -> for time series forecasting task

setup_composition(timeout=<default value>, task_params=<default value>, seed=<default value>, preset=<default value>, with_tuning=<default value>, use_meta_rules=<default value>)[source]

Sets general AutoML parameters.

Parameters

timeout (Optional[float]) – time for model design (in minutes): None or -1 means infinite time.
task_params (TaskParams) – additional parameters of a task.
seed (int) – value for a fixed random seed.
preset (str) –
name of the preset for model building (e.g. best_quality, fast_train, gpu). Default value is auto.
Possible options:
- best_quality -> All models that are available for this data type and task are used
- fast_train -> Models that learn quickly. This includes preprocessing operations (data operations) that only reduce the dimensionality of the data, but cannot increase it. For example, there are no polynomial features and one-hot encoding operations
- stable -> The most reliable preset in which the most stable operations are included
- auto -> Automatically determine which preset should be used
- gpu -> Models that use GPU resources for computation
- ts -> A special preset with models for time series forecasting task
- automl -> A special preset with only AutoML libraries such as TPOT and H2O as operations
with_tuning (bool) – flag for tuning hyperparameters of the final evolved Pipeline. Defaults to True.
use_meta_rules (bool) – indicates whether to change set parameters according to FEDOT meta rules.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_parallelization(n_jobs=<default value>, parallelization_mode=<default value>)[source]

Sets parameters of computational parallelization by CPU jobs.

Parameters

n_jobs (int) – num of jobs for parallelization (set to -1 to use all cpu’s). Defaults to -1.
parallelization_mode (str) – type of evaluation for groups of individuals (populational or sequential). Default value is populational.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_output(logging_level=<default value>, show_progress=<default value>, keep_history=<default value>, history_dir=<default value>, cache_dir=<default value>)[source]

Sets parameters of outputs: logging, cache directories, etc.

Parameters

logging_level (int) –
logging levels are the same as in built-in logging library.
Possible options:
- 50 -> critical
- 40 -> error
- 30 -> warning
- 20 -> info
- 10 -> debug
- 0 -> nonset
show_progress (bool) – indicates whether to show progress using tqdm/tuner or not. Defaults to True.
keep_history (bool) – indicates if the framework should track evolutionary optimization history for possible further analysis. Defaults to True.
history_dir (Optional[str]) – relative or absolute path of a folder for composing history. By default, creates a folder named “FEDOT” in temporary system files of an OS. A relative path is relative to the default value.
cache_dir (Optional[str]) – path to a directory containing cache files (if any cache is enabled). By default, creates a folder named “FEDOT” in temporary system files of an OS.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_evolution(initial_assumption=<default value>, num_of_generations=<default value>, early_stopping_iterations=<default value>, early_stopping_timeout=<default value>, pop_size=<default value>, keep_n_best=<default value>, genetic_scheme=<default value>, use_pipelines_cache=<default value>, optimizer=<default value>)[source]

Sets parameters of ML pipelines evolutionary optimization.

Parameters

initial_assumption (Union[fedot.core.pipelines.pipeline.Pipeline, List[fedot.core.pipelines.pipeline.Pipeline]]) – initial assumption(s) for composer. Can be either a single Pipeline or sequence of ones. Default values are task-specific and selected by the method for_task().
early_stopping_iterations (int) – composer will stop after n generation without improving
num_of_generations (Optional[int]) – number of evolutionary generations for composer. Defaults to None - no limit.
early_stopping_iterations – composer will stop after n generation without improving.
early_stopping_timeout (int) – stagnation timeout in minutes: composer will stop after n minutes without improving. Defaults to 10.
pop_size (int) – size of population (generation) during composing. Defaults to 20.
keep_n_best (int) – number of the best individuals in generation that survive during the evolution. Defaults to 1.
genetic_scheme (str) – name of the genetic scheme. Defaults to steady_state.
use_pipelines_cache (bool) – indicates whether to use pipeline structures caching. Defaults to True.
optimizer (Optional[Type[GraphOptimizer]]) – inherit from golem.core.optimisers.optimizer.GraphOptimizer to specify a custom optimizer. Default optimizer is golem.core.optimisers.genetic.gp_optimizer.EvoGraphOptimizer. See the example.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_pipeline_structure(available_operations=<default value>, max_depth=<default value>, max_arity=<default value>)[source]

Sets constrains on ML pipeline structure.

Parameters

available_operations (List[str]) –
list of model names to use. Pick the names according to operations repository.
Possible options:
- adareg -> AdaBoost Regressor
- ar -> AutoRegression
- arima -> ARIMA
- cgru -> Convolutional Gated Recurrent Unit
- bernb -> Naive Bayes Classifier (multivariate Bernoulli)
- catboost -> Catboost Classifier
- catboostreg -> Catboost Regressor
- dt -> Decision Tree Classifier
- dtreg -> Decision Tree Regressor
- gbr -> Gradient Boosting Regressor
- kmeans -> K-Means clustering
- knn -> K-nearest neighbors Classifier
- knnreg -> K-nearest neighbors Regressor
- lasso -> Lasso Linear Regressor
- lda -> Linear Discriminant Analysis
- lgbm -> Light Gradient Boosting Machine Classifier
- lgbmreg -> Light Gradient Boosting Machine Regressor
- linear -> Linear Regression Regressor
- logit -> Logistic Regression Classifier
- mlp -> Multi-layer Perceptron Classifier
- multinb -> Naive Bayes Classifier (multinomial)
- qda -> Quadratic Discriminant Analysis
- rf -> Random Forest Classifier
- rfr -> Random Forest Regressor
- ridge -> Ridge Linear Regressor
- polyfit -> Polynomial fitter
- sgdr -> Stochastic Gradient Descent Regressor
- stl_arima -> STL Decomposition with ARIMA
- glm -> Generalized Linear Models
- ets -> Exponential Smoothing
- locf -> Last Observation Carried Forward
- ts_naive_average -> Naive Average
- svc -> Support Vector Classifier
- svr -> Linear Support Vector Regressor
- treg -> Extra Trees Regressor
- xgboost -> Extreme Gradient Boosting Classifier
- xgbreg -> Extreme Gradient Boosting Regressor
- cnn -> Convolutional Neural Network
- scaling -> Scaling
- normalization -> Normalization
- simple_imputation -> Imputation
- pca -> Principal Component Analysis
- kernel_pca -> Kernel Principal Component Analysis
- fast_ica -> Independent Component Analysis
- poly_features -> Polynomial Features
- one_hot_encoding -> One-Hot Encoder
- label_encoding -> Label Encoder
- rfe_lin_reg -> Linear Regression Recursive Feature Elimination
- rfe_non_lin_reg -> Decision Tree Recursive Feature Elimination
- rfe_lin_class -> Logistic Regression Recursive Feature Elimination
- rfe_non_lin_class -> Decision Tree Recursive Feature Elimination
- isolation_forest_reg -> Regression Isolation Forest
- isolation_forest_class -> Classification Isolation Forest
- decompose -> Regression Decomposition
- class_decompose -> Classification Decomposition
- resample -> Resample features
- ransac_lin_reg -> Regression Random Sample Consensus
- ransac_non_lin_reg -> Decision Tree Random Sample Consensus
- cntvect -> Count Vectorizer
- text_clean -> Lemmatization and Stemming
- tfidf -> TF-IDF Vectorizer
- word2vec_pretrained -> Word2Vec
- lagged -> Lagged Transformation
- sparse_lagged -> Sparse Lagged Transformation
- smoothing -> Smoothing Transformation
- gaussian_filter -> Gaussian Filter Transformation
- diff_filter -> Derivative Filter Transformation
- cut -> Cut Transformation
- exog_ts -> Exogeneus Transformation
- topological_features -> Topological features
max_depth (int) – max depth of a pipeline. Defaults to 6.
max_arity (int) – max arity of a pipeline nodes. Defaults to 3.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_pipeline_evaluation(metric=<default value>, cv_folds=<default value>, max_pipeline_fit_time=<default value>, collect_intermediate_metric=<default value>)[source]

Sets parameters of ML pipelines quality evaluation.

Parameters

metric (Union[str, MetricsEnum, QualityMetricCallable, ComplexityMetricCallable, Sequence[Union[str, MetricsEnum, QualityMetricCallable, ComplexityMetricCallable]]]) –
metric for quality calculation during composing, also is used for tuning if with_tuning=True.
Default value depends on a given task:
- roc_auc_pen -> for classification
- rmse -> for regression & time series forecasting
Available metrics are listed in the following enumerations:
- classification -> ClassificationMetricsEnum
- regression -> RegressionMetricsEnum
- time series forcasting -> TimeSeriesForecastingMetricsEnum
- pipeline complexity (task-independent) -> ComplexityMetricsEnum
cv_folds (int) –
number of folds for cross-validation.
Default value depends on a given problem:
- 5 -> for classification and regression tasks
- 3 -> for time series forecasting task
max_pipeline_fit_time (Optional[int]) – time constraint for operation fitting (in minutes). Once the limit is reached, a candidate pipeline will be dropped. Defaults to None - no limit.
collect_intermediate_metric (bool) – save metrics for intermediate (non-root) nodes in composed Pipeline.

Returns

FedotBuilder instance.

Return type

FedotBuilder

setup_data_preprocessing(safe_mode=<default value>, use_input_preprocessing=<default value>, use_preprocessing_cache=<default value>, use_auto_preprocessing=<default value>)[source]

Sets parameters of input data preprocessing.

Parameters

safe_mode (bool) – if set True it will cut large datasets to prevent memory overflow and use label encoder instead of one-hot encoder if summary cardinality of categorical features is high. Default value is False.
use_input_preprocessing (bool) – indicates whether to do preprocessing of further given data. Defaults to True.
use_preprocessing_cache (bool) – bool indicating whether to use optional preprocessors caching. Defaults to True.
use_auto_preprocessing (bool) –

Returns

FedotBuilder instance.

Return type

FedotBuilder

build()[source]

Initializes an instance of Fedot with accumulated parameters.

Returns: Fedot instance.
Return type: Fedot