How to save fitted models and interpreting the pipeline in JSON format

FEDOT works mainly with the ‘Pipeline’ object, which is a pipeline of models. For more convenient use of the framework, we provide the ability to upload and download pipelines, and their models for further editing, visual representation, or data transfer. Here are some simple steps to export and import pipeline structure.

Exporting a model pipeline

The Pipeline object has a ‘save_pipeline’ method that takes a single argument, the path to where the JSON object and fitted models will be saved. You can specify the path to save files with the folder name:

~/project/model/my_pipeline,

this way your pipeline and trained models will be saved in a folder in the following hierarchy:

~/project/model/my_pipeline:
- my_pipeline.json
- fitted_models:
  
  model_0.pkl
  
  model_2.pkl
  
  …

Example of use:

from cases.data.data_utils import get_scoring_case_data_paths
from fedot.core.pipelines.pipeline import Pipeline
from fedot.core.pipelines.node import PipelineNode
from fedot.core.data.data import InputData

train_file_path, test_file_path = get_scoring_case_data_paths()
train_data = InputData.from_csv(train_file_path)

pipeline = Pipeline()
node_logit = PipelineNode('logit')

node_lda = PipelineNode('lda')
node_lda.parameters = {'n_components': 1}

node_xgboost = PipelineNode('xgboost')

node_knn_second = PipelineNode('knn')
node_knn_second.parameters = {'n_neighbors': 5}
node_knn_second.nodes_from = [node_logit, node_lda, node_xgboost]

pipeline.add_node(node_knn_second)
pipeline.fit(train_data)

pipeline.save_pipeline("data/my_pipeline")

The ‘save_pipeline’ method:

saves the pipeline’s fitted models to path test/data/my_pipeline/fitted_models,
saves JSON object in the file test/data/my_pipeline/my_pipeline.json,
returns a JSON-like-object

{
    "total_pipeline_models": {
        "logit": 1,
        "lda": 1,
        "xgboost": 1,
        "knn": 1
    },
    "depth": 2,
    "nodes": [
        {
            "model_id": 1,
            "model_type": "logit",
            "model_name": "LogisticRegression",
            "custom_params": "default_params",
            "params": {

            },
            "nodes_from": [],
            "fitted_model_path": "fitted_models/model_1.pkl",
            "preprocessor": "scaling_with_imputation"
        },
        {
            "model_id": 2,
            "model_type": "lda",
            "model_name": "LinearDiscriminantAnalysis",
            "custom_params": {
                "n_components": 1
            },
            "params": {

            },
            "nodes_from": [],
            "fitted_model_path": "fitted_models/model_2.pkl",
            "preprocessor": "scaling_with_imputation"
        },
        {
            "model_id": 3,
            "model_type": "xgboost",
            "model_name": "XGBClassifier",
            "custom_params": "default_params",
            "params": {

            },
            "nodes_from": [],
            "fitted_model_path": "fitted_models/model_3.pkl",
            "preprocessor": "scaling_with_imputation"
        },
        {
            "model_id": 0,
            "model_type": "knn",
            "model_name": "KNeighborsClassifier",
            "custom_params": {
                "n_neighbors": 5
            },
            "params": {

            },
            "nodes_from": [
                1,
                2,
                3
            ],
            "fitted_model_path": "fitted_models/model_0.pkl",
            "preprocessor": "scaling_with_imputation"
        }
    ]
}

NOTE: ‘params’ are all parameters consisting of:

parameters for tuning (custom_params),
standard model parameters in the framework

Model Pipeline import

To import a pipeline, you need to create an empty ‘Pipeline’ object, or an already used one, but all data will be overwritten during import. The ‘load_pipeline’ method takes the path to a file with the JSON extension as an argument.

Example of using a model:

from sklearn.metrics import mean_squared_error

test_data = InputData.from_csv(test_file_path)

pipeline = Pipeline()
pipeline.load_pipeline("data/Month:Day:Year, Time Period my_pipeline/my_pipeline.json")
predicted_values = pipeline.predict(test_data).predict
actual_values = test_data.target

mean_squared_error(predicted_values, actual_values)

NOTE: Required fields for loading the model are: ‘model_id’, ‘model_type’, ‘preprocessor’, ‘params’, ‘nodes_from’. The consequence is that you can create an unusual pipeline.

Now you can upload models, share them, and edit them in a convenient JSON format.