Classification example ============================================== This example explains how to solve classification task using Fedot. Generally Fedot provides a high-level API that enables you to use common fit/predict interface. To use API it is required to import certain object: .. code-block:: python from fedot import Fedot Then we have to load data and split it on train and test set. Fedot uses it's own data object notation (InputData). It contains index, features and target for each sample. You can create it from file using ``InputData.from_dataframe()`` method. You need to provide ``Task`` object with type of task you want to solve. You also can find another ways of data passing in `this example `. .. code-block:: python from fedot.core.data.data import InputData data = InputData.from_dataframe(features_df, target_df, task=Task(TaskTypesEnum.classification)) train, test = train_test_data_setup(data) .. note:: There are 3 possible values for TaskType: * TaskTypesEnum.classification * TaskTypesEnum.regression * TaskTypesEnum.ts_forecasting Initialize the FEDOT object and define the type of modeling problem. In this case, problem is ``classification``. You also can define metric parameter (ROC-AUC in this example), timeout in minutes (in this example we limit fedot for 5 minutes). .. code-block:: python model = Fedot(problem='classification', metric='roc_auc', timeout=5) .. note:: Class ``Fedot.__init__()`` has more, e.g. ``n_jobs`` for parallelization. For more details, see the :doc:`FEDOT API ` section in our documentation. To train our model we should call method ``fit()``. This method returns the best pipeline was obtained during optimization. .. code-block:: python best_pipeline = model.fit(features=train) After the fitting is completed, you can look at the structure of the resulting pipeline. For example, let best pipeline consist of two nodes: resampling operation (*resample*) and Random Forest (*rf*). Let see how it looks like. In text format: .. code-block:: python best_pipeline.print_structure() .. code-block:: text Pipeline structure: {'depth': 2, 'length': 2, 'nodes': [rf, resample]} rf - {'n_jobs': -1, 'bootstrap': False, 'criterion': 'entropy', 'max_features': 0.2452946642710205, 'min_samples_leaf': 6, 'min_samples_split': 4, 'n_estimators': 100} resample - {'balance': 'expand_minority', 'replace': False, 'balance_ratio': 0.5984630982827773} And in plot format: .. code-block:: python best_pipeline.show() |pipeline_structure| .. |pipeline_structure| image:: img_utilities/pipeline_example.png :width: 80% To obtain prediction for test data you need call ``predict()`` or ``predict_proba()`` method from ``Fedot`` class. Since we want to calculate ROC-AUC metric for our test data we should use ``predict_proba()`` method. .. code-block:: python prediction = model.predict_proba(features=test) if visualization: auto_model.plot_prediction() .. hint:: If you want to predict target labels use ``predict()`` method. The ``get_metrics()`` method estimates the quality of predictions according the selected metrics. .. code-block:: python print(auto_model.get_metrics(rounding_order=4)) # we can control the rounding of metrics .. note:: You may see, that get_metrics() returned not only ROC-AUC metric. You can pass names of interested metrics by metrics_name parameter. F.e. ``get_metrics(metric_names=['roc-auc', 'f1'])``. Since you got a prediction and calculated metrics you can plot your prediction by calling ``plot_prediction()`` method. For classification task it plots ROC curve. .. code-block:: python model.plot_prediction() You may interested to save the model. To perform that just call ``best_pipeline.save()`` .. code-block:: python pipeline.save(path='path_to_save_and_load', create_subdir=False, is_datetime_in_path=False) To load fitted pipeline you can invoke ``.load()`` from just initialised ``Pipeline`` object method with passing path to your pipeline. .. code-block:: python from fedot.core.pipelines.pipeline import Pipeline loaded_pipeline = Pipeline().load('path_to_save_and_load') And you can do inference: .. code-block:: python import pandas as pd from fedot.core.repository.dataset_types import DataTypesEnum from fedot.core.repository.tasks import Task, TaskTypesEnum new_features = pd.read_csv('new_data.csv') new_data_to_predict = InputData(features=new_features.values, target=None, # if you don't know your target idx=new_features.index.values, task=Task(TaskTypesEnum.classification), data_type=DataTypesEnum.table) prediction = loaded_pipeline.predict(new_data_to_predict).predict # Note that we should take .predict field for prediction .. note:: ``Pipeline`` don't have method ``predict_proba()``, you need to use param output_mode: desired form of output for operations .. details:: possible ``output_mode`` options: - ``default`` -> (as is, default) - ``labels`` -> (numbers of classes - for classification) - ``probs`` -> (probabilities - for classification == default) - ``full_probs`` -> (return all probabilities - for binary classification) .. note:: For more detail about pipelines save and load, please visit `this section `. Thus by this example we learned how to solve classification task with Fedot.