Tuning of Hyperparameters

To tune pipeline hyperparameters you can use GOLEM. There are two ways:

Tuning of all models hyperparameters simultaneously. Implemented via SimultaneousTuner, OptunaTuner and IOptTuner classes.
Tuning of models hyperparameters sequentially node by node optimizing metric value for the whole pipeline or tuning only one node hyperparametrs. Implemented via SequentialTuner class.

More information about these approaches can be found here.

If with_tuning flag is set to True when using FEDOT API, simultaneous hyperparameters tuning using SimultaneousTuner is applied for composed pipeline and metric value is used as a metric for tuning.

FEDOT uses tuners implementation from GOLEM, see GOLEM documentation for more information.

Tuners comparison
	`SimultaneousTuner`	`SequentialTuner`	`IOptTuner`	`OptunaTuner`
Based on	Hyperopt	Hyperopt	iOpt	Optuna
Type of tuning	Simultaneous	Sequential or for one node only	Simultaneous	Simultaneous
Optimized parameters	categorical discrete continuous	categorical discrete continuous	discrete continuous	categorical discrete continuous
Algorithm type	stochastic	stochastic	deterministic	stochastic
Supported constraints	timeout iterations early_stopping_rounds eval_time_constraint	timeout iterations early_stopping_rounds eval_time_constraint	iterations eval_time_constraint	timeout iterations early_stopping_rounds eval_time_constraint
Supports initial point	Yes	No	No	Yes
Supports multi objective tuning	No	No	No	Yes

Hyperopt based tuners usually take less time for one iteration, but IOptTuner is able to obtain much more stable results.

Simple example

To initialize a tuner you can use TunerBuilder.

from fedot.core.repository.tasks import TaskTypesEnum, Task
from fedot.core.data.data import InputData
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder

task = Task(TaskTypesEnum.classification)
train_data = InputData.from_csv('train_file.csv')
pipeline = PipelineBuilder().add_node('knn', branch_idx=0).add_branch('logit', branch_idx=1)\
    .grow_branches('logit', 'rf').join_branches('knn').build()

pipeline_tuner = TunerBuilder(task).build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Tuner class

Use .with_tuner() to specify tuner class to use. PipelineTuner is used by default.

from golem.core.tuning.sequential import SequentialTuner

tuner = SequentialTuner

pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
    .with_tuner(tuner) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Evaluation

Use .with_requirements() to set number of cv_folds and n_jobs.

requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=2)

pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=10))) \
    .with_requirements(requirements) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Or use methods .with_cv_folds(), .with_n_jobs() to set corresponding values separately.

pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=10))) \
    .with_cv_folds(3) \
    .with_n_jobs(-1) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Metric

Specify metric to optimize using .with_metric().

Metric can be chosen from ClassificationMetricsEnum, RegressionMetricsEnum.

metric = ClassificationMetricsEnum.ROCAUC

pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
    .with_metric(metric) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

You can pass custom metric. For that, implement abstract class QualityMetric and pass CustomMetric.get_value as metric. Note that tuner will minimize the metric.

import sys
from copy import deepcopy
from sklearn.metrics import mean_squared_error as mse
from fedot.core.composer.metrics import QualityMetric
from fedot.core.data.data import InputData, OutputData
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.tasks import TaskTypesEnum, Task


class CustomMetric(QualityMetric):
    default_value = sys.maxsize

    @staticmethod
    def metric(reference: InputData, predicted: OutputData) -> float:
        mse_value = mse(reference.target, predicted.predict, squared=False)
        return (mse_value + 2) * 0.5


pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.regression)) \
    .with_metric(CustomMetric.get_value) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Another way to pass custom metric is to implement a function with the following signature: Callable[[G], Real]. Note that tuner will minimize the metric.

from sklearn.metrics import mean_squared_error as mse
from golem.core.dag.graph import Graph
from fedot.core.data.data import InputData
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.tasks import Task, TaskTypesEnum


def custom_metric(graph: Graph, reference_data: InputData, **kwargs):
    result = graph.predict(reference_data)
    mse_value = mse(reference_data.target, result.predict, squared=False)
    return (mse_value + 2) * 0.5


pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.regression)) \
    .with_metric(custom_metric) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Search Space

To set search space use .with_search_space(). By default, tuner uses search space specified in fedot/core/pipelines/tuning/search_space.py To customize search space use PipelineSearchSpace class.

custom_search_space = {
    'logit': {
        'C': {
            'hyperopt-dist': hp.uniform,
            'sampling-scope': [1e-1, 5.0],
            'type': 'continuous'}
    },
    'pca': {
        'n_components': {
            'hyperopt-dist': hp.uniform,
            'sampling-scope': [0.1, 0.5],
            'type': 'continuous'}
    },
    'knn': {
        'n_neighbors': {
            'hyperopt-dist': hp.uniformint,
            'sampling-scope': [1, 20],
            'type': 'discrete'},
        'weights': {
            'hyperopt-dist': hp.choice,
            'sampling-scope': [["uniform", "distance"]],
            'type': 'categorical'},
        'p': {
            'hyperopt-dist': hp.choice,
            'sampling-scope': [[1, 2]],
            'type': 'categorical'}
    }
}
search_space = PipelineSearchSpace(custom_search_space=custom_search_space, replace_default_search_space=True)

pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
        .with_search_space(search_space) \
        .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Additional parameters

If there is no TunerBuilder function to set a specific parameter of a tuner use .with_additional_params().

Possible additional parameters you can see in the GOLEM documentation.

For example, you can set algorithm for with signature similar to hyperopt.tse.suggest for SimultaneousTuner or SequentialTuner.

By default, hyperopt.tse.suggest is used.

pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
    .with_additional_params(algo = hyperopt.rand.suggest) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

For IOptTuner such parameters as r, evolvent_density, eps_r and etc can be set.

pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
    .with_tuner(IOptTuner) \
    .with_additional_params(r = 1, evolvent_density = 5) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Constraints

Use .with_timeout() to set timeout for tuning.

Use .with_iterations() to set maximal number of tuning iterations.

Use .with_early_stopping_rounds() to specify after what number of iterations without metric improvement tuning will be stopped.

Use .with_eval_time_constraint() to set time constraint for pipeline fitting while it’s evaluation.

timeout = datetime.timedelta(minutes=1)

iterations = 500

early_stopping_rounds = 50

eval_time_constraint = datetime.timedelta(seconds=30)

pipeline_tuner = TunerBuilder(task) \
    .with_timeout(timeout) \
    .with_iterations(iterations) \
    .with_early_stopping_rounds(early_stopping_rounds) \
    .with_eval_time_constraint(eval_time_constraint) \
    .build(input_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

Examples

Tuning all hyperparameters simultaneously

Example for SimultaneousTuner:

import datetime
import hyperopt
from golem.core.tuning.simultaneous import SimultaneousTuner
from hyperopt import hp
from fedot.core.pipelines.pipeline_composer_requirements import PipelineComposerRequirements
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.search_space import PipelineSearchSpace
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import ClassificationMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task

task = Task(TaskTypesEnum.classification)

tuner = SimultaneousTuner

requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=2)

metric = ClassificationMetricsEnum.ROCAUC

iterations = 500

early_stopping_rounds = 50

timeout = datetime.timedelta(minutes=1)

eval_time_constraint = datetime.timedelta(seconds=30)

custom_search_space = {
    'logit': {
        'C': {
            'hyperopt-dist': hp.uniform,
            'sampling-scope': [0.01, 5.0],
            'type': 'continuous'}
    },
    'knn': {
        'n_neighbors': {
            'hyperopt-dist': hp.uniformint,
            'sampling-scope': [1, 20],
            'type': 'discrete'},
        'weights': {
            'hyperopt-dist': hp.choice,
            'sampling-scope': [["uniform", "distance"]],
            'type': 'categorical'},
        'p': {
            'hyperopt-dist': hp.choice,
            'sampling-scope': [[1, 2]],
            'type': 'categorical'}}
}
search_space = PipelineSearchSpace(custom_search_space=custom_search_space, replace_default_search_space=True)

algo = hyperopt.rand.suggest

train_data = InputData.from_csv('train_file.csv')

pipeline = PipelineBuilder().add_node('knn', branch_idx=0).add_branch('logit', branch_idx=1) \
    .grow_branches('logit', 'rf').join_branches('knn').build()

pipeline_tuner = TunerBuilder(task) \
    .with_tuner(tuner) \
    .with_requirements(requirements) \
    .with_metric(metric) \
    .with_iterations(iterations) \
    .with_early_stopping_rounds(early_stopping_rounds) \
    .with_timeout(timeout) \
    .with_search_space(search_space) \
    .with_additional_params(algo=algo) \
    .with_eval_time_constraint(eval_time_constraint) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

tuned_pipeline.print_structure()

Tuned pipeline structure:

Pipeline structure:
{'depth': 3, 'length': 5, 'nodes': [knn, logit, knn, rf, logit]}
knn - {'n_neighbors': 3, 'p': 2, 'weights': 'uniform'}
logit - {'C': 4.564184562288343}
knn - {'n_neighbors': 6, 'p': 2, 'weights': 'uniform'}
rf - {'n_jobs': 1, 'bootstrap': True, 'criterion': 'entropy', 'max_features': 0.46348491415788157, 'min_samples_leaf': 11, 'min_samples_split': 2, 'n_estimators': 100}
logit - {'C': 3.056080157518786}

Example for IOptTuner:

import datetime
from golem.core.tuning.iopt_tuner import IOptTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.pipeline_composer_requirements import PipelineComposerRequirements
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task

task = Task(TaskTypesEnum.regression)

tuner = IOptTuner

requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=2)

metric = RegressionMetricsEnum.MSE

iterations = 100

eval_time_constraint = datetime.timedelta(seconds=30)

train_data = InputData.from_csv('train_data.csv', task='regression')

pipeline = PipelineBuilder().add_node('knnreg', branch_idx=0).add_branch('rfr', branch_idx=1) \
    .join_branches('knnreg').build()

pipeline_tuner = TunerBuilder(task) \
    .with_tuner(tuner) \
    .with_requirements(requirements) \
    .with_metric(metric) \
    .with_iterations(iterations) \
    .with_additional_params(eps=0.02, r=1, refine_solution=True) \
    .with_eval_time_constraint(eval_time_constraint) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

tuned_pipeline.print_structure()

Tuned pipeline structure:

Pipeline structure:
{'depth': 2, 'length': 3, 'nodes': [knnreg, knnreg, rfr]}
knnreg - {'n_neighbors': 51}
knnreg - {'n_neighbors': 40}
rfr - {'n_jobs': 1, 'max_features': 0.05324, 'min_samples_split': 12, 'min_samples_leaf': 11}

Example for OptunaTuner:

from golem.core.tuning.optuna_tuner import OptunaTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task

task = Task(TaskTypesEnum.regression)

tuner = OptunaTuner

metric = RegressionMetricsEnum.MSE

iterations = 100

train_data = InputData.from_csv('train_data.csv', task='regression')

pipeline = PipelineBuilder().add_node('knnreg', branch_idx=0).add_branch('rfr', branch_idx=1) \
    .join_branches('knnreg').build()

pipeline_tuner = TunerBuilder(task) \
    .with_tuner(tuner) \
    .with_metric(metric) \
    .with_iterations(iterations) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

tuned_pipeline.print_structure()

Tuned pipeline structure:

Pipeline structure:
{'depth': 2, 'length': 3, 'nodes': [knnreg, knnreg, rfr]}
knnreg - {'n_neighbors': 51}
knnreg - {'n_neighbors': 40}
rfr - {'n_jobs': 1, 'max_features': 0.05, 'min_samples_split': 12, 'min_samples_leaf': 11}

Multi objective tuning

Multi objective tuning is available only for OptunaTuner. Pass a list of metrics to .with_metric() and obtain a list of tuned pipelines representing a pareto front after tuning.

from typing import Iterable
from golem.core.tuning.optuna_tuner import OptunaTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline import Pipeline
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task

task = Task(TaskTypesEnum.regression)

tuner = OptunaTuner

metric = [RegressionMetricsEnum.MSE, RegressionMetricsEnum.MAE]

iterations = 100

train_data = InputData.from_csv('train_data.csv', task='regression')

pipeline = PipelineBuilder().add_node('knnreg', branch_idx=0).add_branch('rfr', branch_idx=1) \
    .join_branches('knnreg').build()

pipeline_tuner = TunerBuilder(task) \
    .with_tuner(tuner) \
    .with_metric(metric) \
    .with_iterations(iterations) \
    .build(train_data)

pareto_front: Iterable[Pipeline] = pipeline_tuner.tune(pipeline)

Sequential tuning

import datetime
from golem.core.tuning.sequential import SequentialTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task, TsForecastingParams

task = Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=10))

tuner = SequentialTuner

cv_folds = 3

metric = RegressionMetricsEnum.RMSE

iterations = 1000

early_stopping_rounds = 50

timeout = datetime.timedelta(minutes=1)

train_data = InputData.from_csv_time_series(file_path='train_file.csv',
                                            task=task,
                                            target_column='target_name')

pipeline = PipelineBuilder() \
    .add_sequence('locf', branch_idx=0) \
    .add_sequence('lagged', branch_idx=1) \
    .join_branches('ridge') \
    .build()

pipeline_tuner = TunerBuilder(task) \
    .with_tuner(tuner) \
    .with_cv_folds(cv_folds) \
    .with_metric(metric) \
    .with_iterations(iterations) \
    .with_early_stopping_rounds(early_stopping_rounds) \
    .with_timeout(timeout) \
    .build(train_data)

tuned_pipeline = pipeline_tuner.tune(pipeline)

tuned_pipeline.print_structure()

Tuned pipeline structure:

Pipeline structure:
{'depth': 2, 'length': 3, 'nodes': [ridge, locf, lagged]}
ridge - {'alpha': 9.335457825369645}
locf - {'part_for_repeat': 0.34751615772622124}
lagged - {'window_size': 127}

Tuning of a node

import datetime
from golem.core.tuning.sequential import SequentialTuner
from fedot.core.pipelines.pipeline_composer_requirements import PipelineComposerRequirements
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task
from test.integration.quality.test_synthetic_tasks import get_regression_data

task = Task(TaskTypesEnum.regression)

tuner = SequentialTuner

requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=-1)

metric = RegressionMetricsEnum.SMAPE

timeout = datetime.timedelta(minutes=5)

train_data = get_regression_data()

pipeline = PipelineBuilder().add_node('dtreg').grow_branches('lasso').build()


pipeline_tuner = TunerBuilder(task) \
    .with_tuner(tuner) \
    .with_requirements(requirements) \
    .with_metric(metric) \
    .with_timeout(timeout) \
    .build(train_data)

pipeline_with_tuned_node = pipeline_tuner.tune_node(pipeline, node_index=1)

print('Node name: ', pipeline_with_tuned_node.nodes[1].content['name'])
print('Node parameters: ', pipeline_with_tuned_node.nodes[1].custom_params)

Output:

Node name:  dtreg
Node parameters:  {'max_depth': 2, 'min_samples_leaf': 6, 'min_samples_split': 21}

Another examples can be found here:

Regression

Classification

Forecasting

Multitask

Multitask pipeline: classification and regression

Tuning of Hyperparameters

Simple example

TunerBuilder methods

Tuner class

Evaluation

Metric

Search Space

Additional parameters

Constraints

Examples

Tuning all hyperparameters simultaneously

Multi objective tuning

Sequential tuning

Tuning of a node

`TunerBuilder` methods