FEDOT API

class fedot.api.main.Fedot(problem, timeout=5.0, task_params=None, seed=None, logging_level=40, safe_mode=False, n_jobs=-1, **composer_tuner_params)[source]

Bases: object

The main class for FEDOT AutoML API.

Alternatively, may be initialized using the class FedotBuilder, where all the optional AutoML parameters are documented and separated by meaning.

Parameters

problem (str) –
name of the modelling problem to solve. .. details:: Possible options:
- classification -> for classification task
- regression -> for regression task
- ts_forecasting -> for time series forecasting task
timeout (Optional[float]) – time for model design (in minutes): None or -1 means infinite time.
task_params (TaskParams) – additional parameters of the task.
seed (Optional[int]) – value for a fixed random seed.
logging_level (int) –
logging levels are the same as in built-in logging library.
Possible options:
- 50 -> critical
- 40 -> error
- 30 -> warning
- 20 -> info
- 10 -> debug
- 0 -> nonset
safe_mode (bool) – if set True it will cut large datasets to prevent memory overflow and use label encoder instead of OneHot encoder if summary cardinality of categorical features is high. Default value is False.
n_jobs (int) – num of n_jobs for parallelization (set to -1 to use all cpu’s). Defaults to -1.
composer_tuner_params – Additional optional parameters. See their documentation at the methods of FedotBuilder.

fit(features, target='target', predefined_model=None)[source]

Composes and fits a new pipeline, or fits a predefined one.

Parameters

features (Union[str, PathLike, ndarray, DataFrame, InputData, MultiModalData, dict, tuple]) – train data feature values in one of the supported features formats.
target (Union[str, PathLike, ndarray, Series, dict]) – train data target values in one of the supported target formats.
predefined_model (Optional[Union[str, Pipeline]]) – the name of a single model or a Pipeline instance, or auto. With any value specified, the method does not perform composing and tuning. In case of auto, the method generates a single initial assumption and then fits the created pipeline.

Returns

Pipeline object.

Return type

Pipeline

tune(input_data=None, target='target', metric_name=None, iterations=100000, timeout=None, cv_folds=None, n_jobs=None, show_progress=False)[source]

Method for hyperparameters tuning of current pipeline

Parameters

input_data (Optional[Union[str, PathLike, ndarray, DataFrame, InputData, MultiModalData, dict, tuple]]) – data for tuning pipeline in one of the supported formats.
target (Union[str, PathLike, ndarray, Series, dict]) – data target values in one of the supported target formats.
metric_name (Optional[Union[str, QualityMetricCallable, ComplexityMetricCallable]]) – name of metric for quality tuning.
iterations (int) – numbers of tuning iterations.
timeout (Optional[float]) – time for tuning (in minutes). If None or -1 means tuning until max iteration reach.
cv_folds (Optional[int]) – number of folds on data for cross-validation.
n_jobs (Optional[int]) – num of n_jobs for parallelization (-1 for use all cpu’s).
show_progress (bool) – shows progress of tuning if True.

Returns

Pipeline object.

Return type

Pipeline

predict(features, in_sample=True, validation_blocks=None, path_to_save=None)[source]

Predicts new target using already fitted model.

For time-series performs forecast with depth forecast_length if in_sample=False. If in_sample=True performs in-sample forecast using features as sample.

Parameters

features (Union[str, PathLike, ndarray, DataFrame, InputData, MultiModalData, dict, tuple]) – an array with features of test data.
in_sample (bool) – used while time-series prediction. If in_sample=True performs in-sample forecast using features with number if iterations specified in validation_blocks.
validation_blocks (Optional[int]) – number of validation blocks for in-sample forecast.
path_to_save (Optional[Union[PathLike, str]]) – if specified, path to save prediction to.

Returns

An array with prediction values.

Return type

ndarray

predict_proba(features, probs_for_all_classes=False, path_to_save=None)[source]

Predicts the probability of new target using already fitted classification model

Parameters

features (Union[str, PathLike, ndarray, DataFrame, InputData, MultiModalData, dict, tuple]) – an array with features of test data.
probs_for_all_classes (bool) – if True - return probability for each class even for binary classification.
path_to_save (Optional[Union[PathLike, str]]) – if specified, path to save prediction to.

Returns

An array with prediction values.

Return type

ndarray

forecast(pre_history=None, horizon=None, path_to_save=None)[source]

Forecasts the new values of time series. If horizon is bigger than forecast length of fitted model - out-of-sample forecast is applied (not supported for multi-modal data).

Parameters

pre_history (Optional[Union[str, Tuple[ndarray, ndarray], InputData, dict]]) – an array with features for pre-history of the forecast.
horizon (Optional[int]) – amount of steps to forecast.
path_to_save (Optional[Union[PathLike, str]]) – if specified, path to save prediction to.

Returns

An array with prediction values.

Return type

ndarray

load(path)[source]

Loads saved graph from disk

Parameters: path – path to json file with model.

plot_prediction(in_sample=None, target=None)[source]

Plots prediction obtained from a graph.

Parameters

in_sample (Optional[bool]) – if current prediction is in_sample (for time-series forecasting), plots predictions as future values.
target (Optional[Any]) – user-specified name of target variable for MultiModalData.

get_metrics(target=None, metric_names=None, in_sample=None, validation_blocks=None, rounding_order=3)[source]

Gets quality metrics for a fitted graph

Parameters

target (Optional[Union[ndarray, Series]]) – an array with target values of test data. If None, target specified for fit is used.
metric_names (Optional[Union[str, List[str]]]) – names of required metrics.
in_sample (Optional[bool]) – used for time series forecasting. If True prediction will be obtained as .predict(..., in_sample=True).
validation_blocks (Optional[int]) – number of validation blocks for time series in-sample forecast.
rounding_order (int) – number of decimal places for metrics

Returns

Values of quality metrics.

Return type

dict

save_predict(predicted_data, path_to_save)[source]

Saves pipeline forecasts in csv file

Parameters

predicted_data (OutputData) –
path_to_save (Union[PathLike, str]) –

explain(features=None, method='surrogate_dt', visualization=True, **kwargs)[source]

Creates explanation for current_pipeline according to the selected method.

An Explainer instance will return.

Parameters

features (Optional[Union[str, PathLike, ndarray, DataFrame, InputData, MultiModalData, dict, tuple]]) – samples to be explained. If None, train_data from last fit will be used.
method (str) – explanation method, defaults to surrogate_dt
visualization (bool) – print and plot the explanation simultaneously, defaults to True.

Return type

Explainer

Notes

An explanation can be retrieved later by executing Explainer.visualize().

return_report()[source]

Function returns a report on time consumption.

The following steps are presented in this report: - ‘Data Definition (fit)’: Time spent on data definition in fit(). - ‘Data Preprocessing’: Total time spent on preprocessing data, includes fitting and predicting stages. - ‘Fitting (summary)’: Total time spent on Composing, Tuning and Training Inference. - ‘Composing’: Time spent on searching for the best pipeline. - ‘Train Inference’: Time spent on training the pipeline found during composing. - ‘Tuning (composing)’: Time spent on hyperparameters tuning in the whole fitting, if with_tune is True. - ‘Tuning (after)’: Time spent on .tune() (hyperparameters tuning) after composing. - ‘Data Definition (predict)’: Time spent on data definition in predict(). - ‘Predicting’: Time spent on predicting (inference).

Return type: DataFrame

_train_pipeline_on_full_dataset(recommendations, full_train_not_preprocessed)[source]

Applies training procedure for obtained pipeline if dataset was clipped

Parameters

recommendations (Optional[dict]) –
full_train_not_preprocessed (Union[InputData, MultiModalData]) –