Multi-Modal Tasks

FEDOT can solve not only classical tabular data problems, but also problems with multimodal data. In this section, we will consider the main features of the framework for solving such problems.

Multimodal data is data that has a different nature (tables, text, images, time series). Humans perceive the world in a multimodal way, so using this approach in machine learning can also work. Indeed, the sharing of several types of data improves the quality of the model at the expense of information that may be contained in one modality and absent in another.

FEDOT’s API supports multimodal data from the box. The only thing you need is to load data using MultiModalData class:

from fedot.api.main import Fedot
from import train_test_data_setup
from import MultiModalData

data = MultiModalData.from_csv(file_path='multimodal_dataset.csv', task='classification', target_columns='target_column',
                               text_columns=['text_col1', 'text_col2'], columns_to_drop=['col_to_drop1', 'col_to_drop2'], index_col=None)
fit_data, predict_data = train_test_data_setup(data, shuffle=True, split_ratio=0.7)

Using from_csv() method, you should define the task type, and target columns. FEDOT can find text columns automatically, but you can set them manually. You can also select columns which will be dropped from the original dataset. By default, FEDOT reads the first column of every dataset as an index column. If there is no index columns in the dataset, you should set index_col=None. Initialize the FEDOT object and define the type of modeling problem.

model = Fedot(problem='classification', timeout=10)


Class Fedot.__init__() has more than two params, e.g. preset for choosing the set of models or n_jobs for parallelization. For more details, see the FEDOT API section in our documentation.

The fit() method begins the optimization and returns the resulting composite pipeline.,

After the fitting is completed, you can look at the structure of the resulting pipeline.

In text format:



Pipeline structure:
{'depth': 3, 'length': 4, 'nodes': [rf, data_source_table, tfidf, data_source_text/description]}
rf - {'n_jobs': -1, 'bootstrap': False, 'criterion': 'gini', 'max_features': 0.09622420420481334, 'min_samples_leaf': 1, 'min_samples_split': 8}
data_source_table - {}
tfidf - {'min_df': 0.026548403557843454, 'max_df': 0.9547108243944858, 'ngram_range': (1, 2)}
data_source_text/description - {}

And in plot format:


The predict() method, which uses an already fitted pipeline, returns values for the target.

prediction = model.predict(predict_data)

The get_metrics() method estimates the quality of predictions according the selected metrics.

prediction = model.get_metrics()

Example of using FEDOT for multimodal data classification on Wine Reviews dataset:

examples.advanced.multimodal_text_num_example.run_multi_modal_example(file_path, visualization=False, with_tuning=True, timeout=10.0)[source]

Runs FEDOT on multimodal data from the Wine Reviews dataset. The dataset contains information about wine country, region, price, etc. with text features in the description column and other columns containing numerical and categorical features. It is a classification task for wine variety prediction.

  • file_path (str) – path to the file with multimodal data.

  • visualization (bool) – if True, then final pipeline will be visualised.

  • with_tuning (bool) – if True, then pipeline will be tuned.

  • timeout (float) – overall fitting duration


F1 metrics of the model.

Return type