Tabular Data Prediction

Introduction

As common AutoML frameworks, FEDOT solves problems with data that are represented as tables. FEDOT allows you to automate machine learning pipeline design for tabular data in classification and regression problems.

Also, it provides a high-level API that enables you to use common fit/predict interface. To use API it is required to import certain object:

from fedot.api.main import Fedot

Loading training and test data from a CSV file as a Pandas dataframe pd.DataFrame.

train = pd.DataFrame('train.csv')
test = pd.DataFrame('test.csv')

Initialize the FEDOT object and define the type of modeling problem. In this case, problem is classification.

model = Fedot(problem='classification', metric='roc_auc')

Note

Class Fedot.__init__() has more than two params, e.g. timeout for setting time limits or n_jobs for parallelization. For more details, see the FEDOT API section in our documentation.

The fit() method begins the optimization and returns the resulting composite pipeline.

best_pipeline = model.fit(features=train, target='target')

After the fitting is completed, you can look at the structure of the resulting pipeline. For example, let best pipeline consist of two nodes: resampling operation (resample) and Random Forest (rf). Let see how it looks like.

In text format:

best_pipeline.print_structure()
Pipeline structure:
{'depth': 2, 'length': 2, 'nodes': [rf, resample]}
rf - {'n_jobs': -1, 'bootstrap': False, 'criterion': 'entropy', 'max_features': 0.2452946642710205, 'min_samples_leaf': 6, 'min_samples_split': 4, 'n_estimators': 100}
resample - {'balance': 'expand_minority', 'replace': False, 'balance_ratio': 0.5984630982827773}

And in plot format:

best_pipeline.show()

pipeline_structure

The predict() method, which uses an already fitted pipeline, returns values for the target.

prediction = model.predict(features=test)

Hint

If you want to predict target probability use predict_proba() method.

The get_metrics() method estimates the quality of predictions according the selected metrics.

prediction = model.get_metrics()

Note

The same way FEDOT can be used to regression problem. It is only required to change params according the problem in main class object:

model = Fedot(problem='regression', metric='rmse')

Examples

More details you can find in the follow links:

Simple

Advanced

Cases