Tuning of Hyperparameters
To tune pipeline hyperparameters you can use GOLEM. There are two ways:
Tuning of all models hyperparameters simultaneously. Implemented via
SimultaneousTuner
,OptunaTuner
andIOptTuner
classes.Tuning of models hyperparameters sequentially node by node optimizing metric value for the whole pipeline or tuning only one node hyperparametrs. Implemented via
SequentialTuner
class.
More information about these approaches can be found here.
If with_tuning
flag is set to True
when using FEDOT API, simultaneous hyperparameters tuning
using SimultaneousTuner
is applied for composed pipeline and metric
value is used as a metric for tuning.
FEDOT uses tuners implementation from GOLEM, see GOLEM documentation for more information.
|
|
|
|
|
---|---|---|---|---|
Based on |
Hyperopt |
Hyperopt |
iOpt |
Optuna |
Type of tuning |
Simultaneous |
Sequential or
for one node only
|
Simultaneous |
Simultaneous |
Optimized
parameters
|
categorical
discrete
continuous
|
categorical
discrete
continuous
|
discrete
continuous
|
categorical
discrete
continuous
|
Algorithm type |
stochastic |
stochastic |
deterministic |
stochastic |
Supported
constraints
|
timeout
iterations
early_stopping_rounds
eval_time_constraint
|
timeout
iterations
early_stopping_rounds
eval_time_constraint
|
iterations
eval_time_constraint
|
timeout
iterations
early_stopping_rounds
eval_time_constraint
|
Supports initial
point
|
Yes |
No |
No |
Yes |
Supports multi
objective tuning
|
No |
No |
No |
Yes |
Hyperopt based tuners usually take less time for one iteration, but IOptTuner
is able to obtain much more stable results.
Simple example
To initialize a tuner you can use TunerBuilder
.
from fedot.core.repository.tasks import TaskTypesEnum, Task
from fedot.core.data.data import InputData
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
task = Task(TaskTypesEnum.classification)
train_data = InputData.from_csv('train_file.csv')
pipeline = PipelineBuilder().add_node('knn', branch_idx=0).add_branch('logit', branch_idx=1)\
.grow_branches('logit', 'rf').join_branches('knn').build()
pipeline_tuner = TunerBuilder(task).build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
TunerBuilder
methods
Tuner class
Use .with_tuner()
to specify tuner class to use. PipelineTuner
is used by default.
from golem.core.tuning.sequential import SequentialTuner
tuner = SequentialTuner
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
.with_tuner(tuner) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Evaluation
Use .with_requirements()
to set number of cv_folds and n_jobs.
requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=2)
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=10))) \
.with_requirements(requirements) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Or use methods .with_cv_folds()
, .with_n_jobs()
to set corresponding values separately.
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=10))) \
.with_cv_folds(3) \
.with_n_jobs(-1) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Metric
Specify metric to optimize using .with_metric()
.
Metric can be chosen from
ClassificationMetricsEnum
,RegressionMetricsEnum
.
metric = ClassificationMetricsEnum.ROCAUC
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
.with_metric(metric) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
You can pass custom metric. For that, implement abstract class
QualityMetric
and passCustomMetric.get_value
as metric. Note that tuner will minimize the metric.
import sys
from copy import deepcopy
from sklearn.metrics import mean_squared_error as mse
from fedot.core.composer.metrics import QualityMetric
from fedot.core.data.data import InputData, OutputData
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.tasks import TaskTypesEnum, Task
class CustomMetric(QualityMetric):
default_value = sys.maxsize
@staticmethod
def metric(reference: InputData, predicted: OutputData) -> float:
mse_value = mse(reference.target, predicted.predict, squared=False)
return (mse_value + 2) * 0.5
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.regression)) \
.with_metric(CustomMetric.get_value) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Another way to pass custom metric is to implement a function with the following signature:
Callable[[G], Real]
. Note that tuner will minimize the metric.
from sklearn.metrics import mean_squared_error as mse
from golem.core.dag.graph import Graph
from fedot.core.data.data import InputData
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.tasks import Task, TaskTypesEnum
def custom_metric(graph: Graph, reference_data: InputData, **kwargs):
result = graph.predict(reference_data)
mse_value = mse(reference_data.target, result.predict, squared=False)
return (mse_value + 2) * 0.5
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.regression)) \
.with_metric(custom_metric) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Search Space
To set search space use .with_search_space()
. By default, tuner uses search space specified in fedot/core/pipelines/tuning/search_space.py
To customize search space use PipelineSearchSpace
class.
custom_search_space = {
'logit': {
'C': {
'hyperopt-dist': hp.uniform,
'sampling-scope': [1e-1, 5.0],
'type': 'continuous'}
},
'pca': {
'n_components': {
'hyperopt-dist': hp.uniform,
'sampling-scope': [0.1, 0.5],
'type': 'continuous'}
},
'knn': {
'n_neighbors': {
'hyperopt-dist': hp.uniformint,
'sampling-scope': [1, 20],
'type': 'discrete'},
'weights': {
'hyperopt-dist': hp.choice,
'sampling-scope': [["uniform", "distance"]],
'type': 'categorical'},
'p': {
'hyperopt-dist': hp.choice,
'sampling-scope': [[1, 2]],
'type': 'categorical'}
}
}
search_space = PipelineSearchSpace(custom_search_space=custom_search_space, replace_default_search_space=True)
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
.with_search_space(search_space) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Additional parameters
If there is no TunerBuilder
function to set a specific parameter of a tuner use .with_additional_params()
.
Possible additional parameters you can see in the GOLEM documentation.
For example, you can set algorithm for with signature similar to hyperopt.tse.suggest
for SimultaneousTuner
or
SequentialTuner
.
By default, hyperopt.tse.suggest
is used.
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
.with_additional_params(algo = hyperopt.rand.suggest) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
For IOptTuner
such parameters as r
, evolvent_density
, eps_r
and etc can be set.
pipeline_tuner = TunerBuilder(Task(TaskTypesEnum.classification)) \
.with_tuner(IOptTuner) \
.with_additional_params(r = 1, evolvent_density = 5) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Constraints
Use
.with_timeout()
to set timeout for tuning.
Use
.with_iterations()
to set maximal number of tuning iterations.
Use
.with_early_stopping_rounds()
to specify after what number of iterations without metric improvement tuning will be stopped.
Use
.with_eval_time_constraint()
to set time constraint for pipeline fitting while it’s evaluation.
timeout = datetime.timedelta(minutes=1)
iterations = 500
early_stopping_rounds = 50
eval_time_constraint = datetime.timedelta(seconds=30)
pipeline_tuner = TunerBuilder(task) \
.with_timeout(timeout) \
.with_iterations(iterations) \
.with_early_stopping_rounds(early_stopping_rounds) \
.with_eval_time_constraint(eval_time_constraint) \
.build(input_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
Examples
Tuning all hyperparameters simultaneously
Example for SimultaneousTuner
:
import datetime
import hyperopt
from golem.core.tuning.simultaneous import SimultaneousTuner
from hyperopt import hp
from fedot.core.pipelines.pipeline_composer_requirements import PipelineComposerRequirements
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.search_space import PipelineSearchSpace
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import ClassificationMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task
task = Task(TaskTypesEnum.classification)
tuner = SimultaneousTuner
requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=2)
metric = ClassificationMetricsEnum.ROCAUC
iterations = 500
early_stopping_rounds = 50
timeout = datetime.timedelta(minutes=1)
eval_time_constraint = datetime.timedelta(seconds=30)
custom_search_space = {
'logit': {
'C': {
'hyperopt-dist': hp.uniform,
'sampling-scope': [0.01, 5.0],
'type': 'continuous'}
},
'knn': {
'n_neighbors': {
'hyperopt-dist': hp.uniformint,
'sampling-scope': [1, 20],
'type': 'discrete'},
'weights': {
'hyperopt-dist': hp.choice,
'sampling-scope': [["uniform", "distance"]],
'type': 'categorical'},
'p': {
'hyperopt-dist': hp.choice,
'sampling-scope': [[1, 2]],
'type': 'categorical'}}
}
search_space = PipelineSearchSpace(custom_search_space=custom_search_space, replace_default_search_space=True)
algo = hyperopt.rand.suggest
train_data = InputData.from_csv('train_file.csv')
pipeline = PipelineBuilder().add_node('knn', branch_idx=0).add_branch('logit', branch_idx=1) \
.grow_branches('logit', 'rf').join_branches('knn').build()
pipeline_tuner = TunerBuilder(task) \
.with_tuner(tuner) \
.with_requirements(requirements) \
.with_metric(metric) \
.with_iterations(iterations) \
.with_early_stopping_rounds(early_stopping_rounds) \
.with_timeout(timeout) \
.with_search_space(search_space) \
.with_additional_params(algo=algo) \
.with_eval_time_constraint(eval_time_constraint) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
tuned_pipeline.print_structure()
Tuned pipeline structure:
Pipeline structure:
{'depth': 3, 'length': 5, 'nodes': [knn, logit, knn, rf, logit]}
knn - {'n_neighbors': 3, 'p': 2, 'weights': 'uniform'}
logit - {'C': 4.564184562288343}
knn - {'n_neighbors': 6, 'p': 2, 'weights': 'uniform'}
rf - {'n_jobs': 1, 'bootstrap': True, 'criterion': 'entropy', 'max_features': 0.46348491415788157, 'min_samples_leaf': 11, 'min_samples_split': 2, 'n_estimators': 100}
logit - {'C': 3.056080157518786}
Example for IOptTuner
:
import datetime
from golem.core.tuning.iopt_tuner import IOptTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.pipeline_composer_requirements import PipelineComposerRequirements
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task
task = Task(TaskTypesEnum.regression)
tuner = IOptTuner
requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=2)
metric = RegressionMetricsEnum.MSE
iterations = 100
eval_time_constraint = datetime.timedelta(seconds=30)
train_data = InputData.from_csv('train_data.csv', task='regression')
pipeline = PipelineBuilder().add_node('knnreg', branch_idx=0).add_branch('rfr', branch_idx=1) \
.join_branches('knnreg').build()
pipeline_tuner = TunerBuilder(task) \
.with_tuner(tuner) \
.with_requirements(requirements) \
.with_metric(metric) \
.with_iterations(iterations) \
.with_additional_params(eps=0.02, r=1, refine_solution=True) \
.with_eval_time_constraint(eval_time_constraint) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
tuned_pipeline.print_structure()
Tuned pipeline structure:
Pipeline structure:
{'depth': 2, 'length': 3, 'nodes': [knnreg, knnreg, rfr]}
knnreg - {'n_neighbors': 51}
knnreg - {'n_neighbors': 40}
rfr - {'n_jobs': 1, 'max_features': 0.05324, 'min_samples_split': 12, 'min_samples_leaf': 11}
Example for OptunaTuner
:
from golem.core.tuning.optuna_tuner import OptunaTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task
task = Task(TaskTypesEnum.regression)
tuner = OptunaTuner
metric = RegressionMetricsEnum.MSE
iterations = 100
train_data = InputData.from_csv('train_data.csv', task='regression')
pipeline = PipelineBuilder().add_node('knnreg', branch_idx=0).add_branch('rfr', branch_idx=1) \
.join_branches('knnreg').build()
pipeline_tuner = TunerBuilder(task) \
.with_tuner(tuner) \
.with_metric(metric) \
.with_iterations(iterations) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
tuned_pipeline.print_structure()
Tuned pipeline structure:
Pipeline structure:
{'depth': 2, 'length': 3, 'nodes': [knnreg, knnreg, rfr]}
knnreg - {'n_neighbors': 51}
knnreg - {'n_neighbors': 40}
rfr - {'n_jobs': 1, 'max_features': 0.05, 'min_samples_split': 12, 'min_samples_leaf': 11}
Multi objective tuning
Multi objective tuning is available only for OptunaTuner
. Pass a list of metrics to .with_metric()
and obtain a list of tuned pipelines representing a pareto front after tuning.
from typing import Iterable
from golem.core.tuning.optuna_tuner import OptunaTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline import Pipeline
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task
task = Task(TaskTypesEnum.regression)
tuner = OptunaTuner
metric = [RegressionMetricsEnum.MSE, RegressionMetricsEnum.MAE]
iterations = 100
train_data = InputData.from_csv('train_data.csv', task='regression')
pipeline = PipelineBuilder().add_node('knnreg', branch_idx=0).add_branch('rfr', branch_idx=1) \
.join_branches('knnreg').build()
pipeline_tuner = TunerBuilder(task) \
.with_tuner(tuner) \
.with_metric(metric) \
.with_iterations(iterations) \
.build(train_data)
pareto_front: Iterable[Pipeline] = pipeline_tuner.tune(pipeline)
Sequential tuning
import datetime
from golem.core.tuning.sequential import SequentialTuner
from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task, TsForecastingParams
task = Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=10))
tuner = SequentialTuner
cv_folds = 3
metric = RegressionMetricsEnum.RMSE
iterations = 1000
early_stopping_rounds = 50
timeout = datetime.timedelta(minutes=1)
train_data = InputData.from_csv_time_series(file_path='train_file.csv',
task=task,
target_column='target_name')
pipeline = PipelineBuilder() \
.add_sequence('locf', branch_idx=0) \
.add_sequence('lagged', branch_idx=1) \
.join_branches('ridge') \
.build()
pipeline_tuner = TunerBuilder(task) \
.with_tuner(tuner) \
.with_cv_folds(cv_folds) \
.with_metric(metric) \
.with_iterations(iterations) \
.with_early_stopping_rounds(early_stopping_rounds) \
.with_timeout(timeout) \
.build(train_data)
tuned_pipeline = pipeline_tuner.tune(pipeline)
tuned_pipeline.print_structure()
Tuned pipeline structure:
Pipeline structure:
{'depth': 2, 'length': 3, 'nodes': [ridge, locf, lagged]}
ridge - {'alpha': 9.335457825369645}
locf - {'part_for_repeat': 0.34751615772622124}
lagged - {'window_size': 127}
Tuning of a node
import datetime
from golem.core.tuning.sequential import SequentialTuner
from fedot.core.pipelines.pipeline_composer_requirements import PipelineComposerRequirements
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.metrics_repository import RegressionMetricsEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task
from test.integration.quality.test_synthetic_tasks import get_regression_data
task = Task(TaskTypesEnum.regression)
tuner = SequentialTuner
requirements = PipelineComposerRequirements(cv_folds=2, n_jobs=-1)
metric = RegressionMetricsEnum.SMAPE
timeout = datetime.timedelta(minutes=5)
train_data = get_regression_data()
pipeline = PipelineBuilder().add_node('dtreg').grow_branches('lasso').build()
pipeline_tuner = TunerBuilder(task) \
.with_tuner(tuner) \
.with_requirements(requirements) \
.with_metric(metric) \
.with_timeout(timeout) \
.build(train_data)
pipeline_with_tuned_node = pipeline_tuner.tune_node(pipeline, node_index=1)
print('Node name: ', pipeline_with_tuned_node.nodes[1].content['name'])
print('Node parameters: ', pipeline_with_tuned_node.nodes[1].custom_params)
Output:
Node name: dtreg
Node parameters: {'max_depth': 2, 'min_samples_leaf': 6, 'min_samples_split': 21}
Another examples can be found here:
Regression
Classification
Forecasting
Multitask