Builder for FEDOT API
- class fedot.api.builder.FedotBuilder(problem)[source]
Bases:
object
An alternative FEDOT API version with optional attributes being documented and separated into groups by meaning. Each of the groups has corresponding setter method, named starting with
setup_*
. Use these methods to set corresponding API attributes:setup_composition()
-> general AutoML parameterssetup_parallelization()
-> parameters of computational parallelization by CPU jobssetup_output()
-> parameters of outputs: logging, cache directories, etc.setup_evolution()
-> parameters of ML pipelines evolutionary optimizationsetup_pipeline_structure()
-> constrains on ML pipeline structuresetup_pipeline_evaluation()
-> parameters of ML pipelines quality evaluationsetup_data_preprocessing()
-> parameters of input data preprocessing
After all demanded attributes are set, use
build()
to get a parametrized instance ofFedot
.Examples
Example 1:
from fedot import FedotBuilder from fedot.core.utils import fedot_project_root if __name__ == '__main__': train_data_path = f'{fedot_project_root()}/examples/real_cases/data/scoring/scoring_train.csv' test_data_path = f'{fedot_project_root()}/examples/real_cases/data/scoring/scoring_test.csv' fedot = (FedotBuilder(problem='classification') .setup_composition(timeout=10, with_tuning=True, preset='best_quality') .setup_pipeline_evaluation(max_pipeline_fit_time=5, metric=['roc_auc', 'precision']) .build()) fedot.fit(features=train_data_path, target='target') fedot.predict_proba(features=test_data_path) fedot.plot_prediction()
Example 2:
from fedot import FedotBuilder from fedot.core.utils import fedot_project_root if __name__ == '__main__': SEED = 42 builder = (FedotBuilder('ts_forecasting') .setup_composition(preset='fast_train', timeout=0.5, with_tuning=True, seed=SEED) .setup_evolution(num_of_generations=3) .setup_pipeline_evaluation(metric='mae')) datasets_path = fedot_project_root() / 'examples/data/ts' resulting_models = {} for data_path in datasets_path.iterdir(): if data_path.name == 'ts_sea_level.csv': continue fedot = builder.build() fedot.fit(data_path, target='value') fedot.predict(features=fedot.train_data, validation_blocks=2) fedot.plot_prediction() fedot.current_pipeline.show() resulting_models[data_path.stem] = fedot
- Parameters
problem (str) –
name of a modelling problem to solve.
Possible options:
classification
-> for classification taskregression
-> for regression taskts_forecasting
-> for time series forecasting task
- setup_composition(timeout=<default value>, task_params=<default value>, seed=<default value>, preset=<default value>, with_tuning=<default value>, use_meta_rules=<default value>)[source]
Sets general AutoML parameters.
- Parameters
timeout (Optional[float]) – time for model design (in minutes):
None
or-1
means infinite time.task_params (TaskParams) – additional parameters of a task.
seed (int) – value for a fixed random seed.
preset (str) –
name of the preset for model building (e.g.
best_quality
,fast_train
,gpu
). Default value isauto
.Possible options:
best_quality
-> All models that are available for this data type and task are usedfast_train
-> Models that learn quickly. This includes preprocessing operations (data operations) that only reduce the dimensionality of the data, but cannot increase it. For example, there are no polynomial features and one-hot encoding operationsstable
-> The most reliable preset in which the most stable operations are includedauto
-> Automatically determine which preset should be usedgpu
-> Models that use GPU resources for computationts
-> A special preset with models for time series forecasting taskautoml
-> A special preset with only AutoML libraries such as TPOT and H2O as operations
with_tuning (bool) – flag for tuning hyperparameters of the final evolved
Pipeline
. Defaults toTrue
.use_meta_rules (bool) – indicates whether to change set parameters according to FEDOT meta rules.
- Returns
FedotBuilder
instance.- Return type
- setup_parallelization(n_jobs=<default value>, parallelization_mode=<default value>)[source]
Sets parameters of computational parallelization by CPU jobs.
- Parameters
n_jobs (int) – num of jobs for parallelization (set to
-1
to use all cpu’s). Defaults to-1
.parallelization_mode (str) – type of evaluation for groups of individuals (
populational
orsequential
). Default value ispopulational
.
- Returns
FedotBuilder
instance.- Return type
- setup_output(logging_level=<default value>, show_progress=<default value>, keep_history=<default value>, history_dir=<default value>, cache_dir=<default value>)[source]
Sets parameters of outputs: logging, cache directories, etc.
- Parameters
logging_level (int) –
logging levels are the same as in built-in logging library.
Possible options:
50
-> critical40
-> error30
-> warning20
-> info10
-> debug0
-> nonset
show_progress (bool) – indicates whether to show progress using tqdm/tuner or not. Defaults to
True
.keep_history (bool) – indicates if the framework should track evolutionary optimization history for possible further analysis. Defaults to
True
.history_dir (Optional[str]) – relative or absolute path of a folder for composing history. By default, creates a folder named “FEDOT” in temporary system files of an OS. A relative path is relative to the default value.
cache_dir (Optional[str]) – path to a directory containing cache files (if any cache is enabled). By default, creates a folder named “FEDOT” in temporary system files of an OS.
- Returns
FedotBuilder
instance.- Return type
- setup_evolution(initial_assumption=<default value>, num_of_generations=<default value>, early_stopping_iterations=<default value>, early_stopping_timeout=<default value>, pop_size=<default value>, keep_n_best=<default value>, genetic_scheme=<default value>, use_pipelines_cache=<default value>, optimizer=<default value>)[source]
Sets parameters of ML pipelines evolutionary optimization.
- Parameters
initial_assumption (Union[fedot.core.pipelines.pipeline.Pipeline, List[fedot.core.pipelines.pipeline.Pipeline]]) – initial assumption(s) for composer. Can be either a single
Pipeline
or sequence of ones. Default values are task-specific and selected by the methodfor_task()
.early_stopping_iterations (int) – composer will stop after
n
generation without improvingnum_of_generations (Optional[int]) – number of evolutionary generations for composer. Defaults to
None
- no limit.early_stopping_iterations – composer will stop after
n
generation without improving.early_stopping_timeout (int) – stagnation timeout in minutes: composer will stop after
n
minutes without improving. Defaults to10
.pop_size (int) – size of population (generation) during composing. Defaults to
20
.keep_n_best (int) – number of the best individuals in generation that survive during the evolution. Defaults to
1
.genetic_scheme (str) – name of the genetic scheme. Defaults to
steady_state
.use_pipelines_cache (bool) – indicates whether to use pipeline structures caching. Defaults to
True
.optimizer (Optional[Type[GraphOptimizer]]) – inherit from
golem.core.optimisers.optimizer.GraphOptimizer
to specify a custom optimizer. Default optimizer isgolem.core.optimisers.genetic.gp_optimizer.EvoGraphOptimizer
. See the example.
- Returns
FedotBuilder
instance.- Return type
- setup_pipeline_structure(available_operations=<default value>, max_depth=<default value>, max_arity=<default value>)[source]
Sets constrains on ML pipeline structure.
- Parameters
available_operations (List[str]) –
list of model names to use. Pick the names according to operations repository.
Possible options:
adareg
-> AdaBoost Regressorar
-> AutoRegressionarima
-> ARIMAcgru
-> Convolutional Gated Recurrent Unitbernb
-> Naive Bayes Classifier (multivariate Bernoulli)catboost
-> Catboost Classifiercatboostreg
-> Catboost Regressordt
-> Decision Tree Classifierdtreg
-> Decision Tree Regressorgbr
-> Gradient Boosting Regressorkmeans
-> K-Means clusteringknn
-> K-nearest neighbors Classifierknnreg
-> K-nearest neighbors Regressorlasso
-> Lasso Linear Regressorlda
-> Linear Discriminant Analysislgbm
-> Light Gradient Boosting Machine Classifierlgbmreg
-> Light Gradient Boosting Machine Regressorlinear
-> Linear Regression Regressorlogit
-> Logistic Regression Classifiermlp
-> Multi-layer Perceptron Classifiermultinb
-> Naive Bayes Classifier (multinomial)qda
-> Quadratic Discriminant Analysisrf
-> Random Forest Classifierrfr
-> Random Forest Regressorridge
-> Ridge Linear Regressorpolyfit
-> Polynomial fittersgdr
-> Stochastic Gradient Descent Regressorstl_arima
-> STL Decomposition with ARIMAglm
-> Generalized Linear Modelsets
-> Exponential Smoothinglocf
-> Last Observation Carried Forwardts_naive_average
-> Naive Averagesvc
-> Support Vector Classifiersvr
-> Linear Support Vector Regressortreg
-> Extra Trees Regressorxgboost
-> Extreme Gradient Boosting Classifierxgbreg
-> Extreme Gradient Boosting Regressorcnn
-> Convolutional Neural Networkscaling
-> Scalingnormalization
-> Normalizationsimple_imputation
-> Imputationpca
-> Principal Component Analysiskernel_pca
-> Kernel Principal Component Analysisfast_ica
-> Independent Component Analysispoly_features
-> Polynomial Featuresone_hot_encoding
-> One-Hot Encoderlabel_encoding
-> Label Encoderrfe_lin_reg
-> Linear Regression Recursive Feature Eliminationrfe_non_lin_reg
-> Decision Tree Recursive Feature Eliminationrfe_lin_class
-> Logistic Regression Recursive Feature Eliminationrfe_non_lin_class
-> Decision Tree Recursive Feature Eliminationisolation_forest_reg
-> Regression Isolation Forestisolation_forest_class
-> Classification Isolation Forestdecompose
-> Regression Decompositionclass_decompose
-> Classification Decompositionresample
-> Resample featuresransac_lin_reg
-> Regression Random Sample Consensusransac_non_lin_reg
-> Decision Tree Random Sample Consensuscntvect
-> Count Vectorizertext_clean
-> Lemmatization and Stemmingtfidf
-> TF-IDF Vectorizerword2vec_pretrained
-> Word2Veclagged
-> Lagged Transformationsparse_lagged
-> Sparse Lagged Transformationsmoothing
-> Smoothing Transformationgaussian_filter
-> Gaussian Filter Transformationdiff_filter
-> Derivative Filter Transformationcut
-> Cut Transformationexog_ts
-> Exogeneus Transformationtopological_features
-> Topological features
max_depth (int) – max depth of a pipeline. Defaults to
6
.max_arity (int) – max arity of a pipeline nodes. Defaults to
3
.
- Returns
FedotBuilder
instance.- Return type
- setup_pipeline_evaluation(metric=<default value>, cv_folds=<default value>, max_pipeline_fit_time=<default value>, collect_intermediate_metric=<default value>)[source]
Sets parameters of ML pipelines quality evaluation.
- Parameters
metric (Union[str, MetricsEnum, QualityMetricCallable, ComplexityMetricCallable, Sequence[Union[str, MetricsEnum, QualityMetricCallable, ComplexityMetricCallable]]]) –
metric for quality calculation during composing, also is used for tuning if
with_tuning=True
.Default value depends on a given task:
roc_auc_pen
-> for classificationrmse
-> for regression & time series forecasting
Available metrics are listed in the following enumerations:
classification ->
ClassificationMetricsEnum
regression ->
RegressionMetricsEnum
time series forcasting ->
TimeSeriesForecastingMetricsEnum
pipeline complexity (task-independent) ->
ComplexityMetricsEnum
cv_folds (int) –
number of folds for cross-validation.
Default value depends on a given
problem
:5
-> for classification and regression tasks3
-> for time series forecasting task
max_pipeline_fit_time (Optional[int]) – time constraint for operation fitting (in minutes). Once the limit is reached, a candidate pipeline will be dropped. Defaults to
None
- no limit.collect_intermediate_metric (bool) – save metrics for intermediate (non-root) nodes in composed
Pipeline
.
- Returns
FedotBuilder
instance.- Return type
- setup_data_preprocessing(safe_mode=<default value>, use_input_preprocessing=<default value>, use_preprocessing_cache=<default value>, use_auto_preprocessing=<default value>)[source]
Sets parameters of input data preprocessing.
- Parameters
safe_mode (bool) – if set
True
it will cut large datasets to prevent memory overflow and use label encoder instead of one-hot encoder if summary cardinality of categorical features is high. Default value isFalse
.use_input_preprocessing (bool) – indicates whether to do preprocessing of further given data. Defaults to
True
.use_preprocessing_cache (bool) – bool indicating whether to use optional preprocessors caching. Defaults to
True
.use_auto_preprocessing (bool) –
- Returns
FedotBuilder
instance.- Return type