Data

fedot.core.data.data.POSSIBLE_TABULAR_IDX_KEYWORDS = ['idx', 'index', 'id', 'unnamed: 0']

The list of keyword for auto-detecting csv tabular data index. Used in Data.from_csv() and MultiModalData.from_csv().

fedot.core.data.data.POSSIBLE_TS_IDX_KEYWORDS = ['datetime', 'date', 'time', 'unnamed: 0']

The list of keyword for auto-detecting csv time-series data index. Used in Data.from_csv_time_series(), Data.from_csv_multi_time_series() and MultiModalData.from_csv_time_series().

class fedot.core.data.data.Data(idx, task, data_type, features, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>)[source]

Bases: object

Base Data type class

Parameters
  • idx (np.ndarray) –

  • task (Task) –

  • data_type (DataTypesEnum) –

  • features (Union[np.ndarray, pd.DataFrame]) –

  • categorical_features (Optional[np.ndarray]) –

  • categorical_idx (Optional[np.ndarray]) –

  • numerical_idx (Optional[np.ndarray]) –

  • encoded_idx (Optional[np.ndarray]) –

  • features_names (Optional[np.ndarray[str]]) –

  • target (Optional[np.ndarray]) –

  • supplementary_data (SupplementaryData) –

Return type

None

classmethod from_numpy(features_array, target_array, features_names=None, categorical_idx=None, idx=None, task='classification', data_type=DataTypesEnum.table)[source]

Import data from numpy array.

Parameters
  • features_array (np.ndarray) – numpy array with features.

  • target_array (np.ndarray) – numpy array with target.

  • features_names (np.ndarray[str]) – numpy array with names of features

  • categorical_idx (Union[list[int, str], np.ndarray[int, str]]) – a list or numpy array with indexes or names of features (if provided feature_names) that indicate that the feature is categorical.

  • idx (Optional[np.ndarray]) – indices of arrays.

  • task (Union[Task, str]) – the Task to solve with the data.

  • data_type (Optional[DataTypesEnum]) – the type of the data. Possible values are listed at DataTypesEnum.

Returns

InputData

representation of data in an internal data structure.

Return type

data

classmethod from_numpy_time_series(features_array, target_array=None, idx=None, task='ts_forecasting', data_type=DataTypesEnum.ts)[source]

Import time series from numpy array.

Parameters
  • features_array (numpy.ndarray) – numpy array with features time series.

  • target_array (Optional[numpy.ndarray]) – numpy array with target time series (if None same as features).

  • idx (Optional[numpy.ndarray]) – indices of arrays.

  • task (Union[Task, str]) – the Task to solve with the data.

  • data_type (Optional[DataTypesEnum]) – the type of the data. Possible values are listed at DataTypesEnum.

Returns

InputData

representation of data in an internal data structure.

Return type

data

classmethod from_dataframe(features_df, target_df, categorical_idx=None, task='classification', data_type=DataTypesEnum.table)[source]

Import data from pandas DataFrame.

Parameters
  • features_df (Union[pd.DataFrame, pd.Series]) – loaded pandas DataFrame or Series with features.

  • target_df (Union[pd.DataFrame, pd.Series]) – loaded pandas DataFrame or Series with target.

  • categorical_idx (Union[list[int, str], np.ndarray[int, str]]) – a list or numpy array with indexes or names of features that indicate that the feature is categorical.

  • task (Union[Task, str]) – the Task to solve with the data.

  • data_type (DataTypesEnum) – the type of the data. Possible values are listed at DataTypesEnum.

Returns

InputData

representation of data in an internal data structure.

Return type

data

classmethod from_csv(file_path, delimiter=',', task='classification', data_type=DataTypesEnum.table, columns_to_drop=None, target_columns='', categorical_idx=None, index_col=None, possible_idx_keywords=None)[source]

Import data from csv.

Parameters
  • file_path (PathType) – the path to the CSV with data.

  • columns_to_drop (Optional[List[Union[str, int]]]) – the names of columns that should be dropped.

  • delimiter (str) – the delimiter to separate the columns.

  • task (Union[Task, str]) – the Task to solve with the data.

  • data_type (DataTypesEnum) – the type of the data. Possible values are listed at DataTypesEnum.

  • target_columns (Union[str, List[Union[str, int]], None]) – name of the target column (the last column if empty and no target if None).

  • categorical_idx (Union[list[int, str], np.ndarray[int, str]]) – a list or numpy array with indexes or names of features that indicate that the feature is categorical.

  • index_col (Optional[Union[str, int]]) –

    name or index of the column to use as the Data.idx.

    If None, then check the first column’s name and use it as index if succeeded (see the param possible_idx_keywords).

    Set False to skip the check and rearrange a new integer index.

  • possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the POSSIBLE_TABULAR_IDX_KEYWORDS for the list of default keywords.

Returns

data

Return type

InputData

classmethod from_csv_time_series(file_path, delimiter=',', task='ts_forecasting', is_predict=False, columns_to_drop=None, target_column='', index_col=None, possible_idx_keywords=None)[source]

Forms InputData of ts type from columns of different variant of the same variable.

Parameters
  • file_path (Union[os.PathLike, str]) – path to the source csv file.

  • delimiter (str) – delimiter for pandas DataFrame.

  • task (Union[Task, str]) – the Task that should be solved with data.

  • is_predict (bool) – indicator of stage to prepare the data to. False means fit, True means predict.

  • columns_to_drop (Optional[List]) – list with names of columns to ignore.

  • target_column (Optional[str]) – string with name of target column, used for predict stage.

  • index_col (Optional[Union[str, int]]) –

    name or index of the column to use as the Data.idx.

    If None, then check the first column’s name and use it as index if succeeded (see the param possible_idx_keywords).

    Set False to skip the check and rearrange a new integer index.

  • possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the POSSIBLE_TS_IDX_KEYWORDS for the list of default keywords.

Returns

An instance of InputData.

Return type

InputData

classmethod from_csv_multi_time_series(file_path, delimiter=',', task='ts_forecasting', is_predict=False, columns_to_use=None, target_column='', index_col=None, possible_idx_keywords=None)[source]

Forms InputData of multi_ts type from columns of different variant of the same variable

Parameters
  • file_path (Union[os.PathLike, str]) – path to csv file.

  • delimiter (str) – delimiter for pandas df.

  • task (Union[Task, str]) – the Task that should be solved with data.

  • is_predict (bool) – indicator of stage to prepare the data to. False means fit, True means predict.

  • columns_to_use (Optional[list]) – list with names of columns of different variant of the same variable.

  • target_column (Optional[str]) – string with name of target column, used for predict stage.

  • index_col (Optional[Union[str, int]]) –

    name or index of the column to use as the Data.idx.

    If None, then check the first column’s name and use it as index if succeeded (see the param possible_idx_keywords).

    Set False to skip the check and rearrange a new integer index.

  • possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the POSSIBLE_TS_IDX_KEYWORDS for the list of default keywords.

Returns

An instance of InputData.

Return type

InputData

static from_image(images=None, labels=None, task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), target_size=None)[source]

Input data from Image

Parameters
  • images (Optional[Union[str, numpy.ndarray]]) – the path to the directory with image data in np.ndarray format or array in np.ndarray format

  • labels (Optional[Union[str, numpy.ndarray]]) – the path to the directory with image labels in np.ndarray format or array in np.ndarray format

  • task (Task) – the Task that should be solved with data

  • target_size (Optional[Tuple[int, int]]) – size for the images resizing (if necessary)

Returns

An instance of InputData.

Return type

InputData

static from_json_files(files_path, fields_to_use, label='label', task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), data_type=DataTypesEnum.table, export_to_meta=False, is_multilabel=False, shuffle=True)[source]

Generates InputData from the set of JSON files with different fields

Parameters
  • files_path (str) – path the folder with json files

  • fields_to_use (List) – list of fields that will be considered as a features

  • label (str) – name of field with target variable

  • task (Task) – Task to solve

  • data_type (DataTypesEnum) – data type in fields (as well as type for obtained InputData)

  • export_to_meta – combine extracted field and save to CSV

  • is_multilabel – if True, creates multilabel target

  • shuffle – if True, shuffles data

Returns

An instance of InputData.

Return type

InputData

class fedot.core.data.data.InputData(idx, task, data_type, features, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>)[source]

Bases: fedot.core.data.data.Data

Data class for input data for the nodes

Parameters
  • idx (np.ndarray) –

  • task (Task) –

  • data_type (DataTypesEnum) –

  • features (Union[np.ndarray, pd.DataFrame]) –

  • categorical_features (Optional[np.ndarray]) –

  • categorical_idx (Optional[np.ndarray]) –

  • numerical_idx (Optional[np.ndarray]) –

  • encoded_idx (Optional[np.ndarray]) –

  • features_names (Optional[np.ndarray[str]]) –

  • target (Optional[np.ndarray]) –

  • supplementary_data (SupplementaryData) –

Return type

None

property num_classes: Optional[int]

Returns number of classes that are present in the target. NB: if some labels are not present in this data, then number of classes can be less than in the full dataset!

property class_labels: Optional[int]

Returns unique class labels that are present in the target

subset_indices(selected_idx)[source]

Get subset from InputData to extract all items with specified indices

Parameters

selected_idx (List) – list of indices for extraction

Returns

InputData

subset_features(feature_ids)[source]

Return new InputData with subset of features based on non-empty features_ids list or None otherwise

Parameters

feature_ids (numpy.array) –

Return type

Optional[InputData]

shuffle()[source]

Shuffles features and target if possible

convert_non_int_indexes_for_fit(pipeline)[source]

Conversion non int (datetime, string, etc) indexes in integer form on the fit stage

convert_non_int_indexes_for_predict(pipeline)[source]

Conversion non int (datetime, string, etc) indexes in integer form on the predict stage

class fedot.core.data.data.OutputData(idx, task, data_type, features=None, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>, predict=None)[source]

Bases: fedot.core.data.data.Data

Data type for data prediction in the node

Parameters
  • idx (np.ndarray) –

  • task (Task) –

  • data_type (DataTypesEnum) –

  • features (Optional[Union[np.ndarray, pd.DataFrame]]) –

  • categorical_features (Optional[np.ndarray]) –

  • categorical_idx (Optional[np.ndarray]) –

  • numerical_idx (Optional[np.ndarray]) –

  • encoded_idx (Optional[np.ndarray]) –

  • features_names (Optional[np.ndarray[str]]) –

  • target (Optional[np.ndarray]) –

  • supplementary_data (SupplementaryData) –

  • predict (Optional[np.ndarray]) –

Return type

None

fedot.core.data.data._resize_image(file_path, target_size)[source]

Function resizes and rewrites the input image

Parameters
  • file_path (str) –

  • target_size (Tuple[int, int]) –

fedot.core.data.data.process_target_and_features(data_frame, target_column)[source]

Function process pandas dataframe with single column

Parameters
  • data_frame (pandas.core.frame.DataFrame) – loaded pandas DataFrame

  • target_column (Optional[Union[str, List[str]]]) – names of columns with target or None

Returns

(np.array (table) with features, np.array (column) with target)

Return type

Tuple[numpy.ndarray, Optional[numpy.ndarray]]

fedot.core.data.data.np_datetime_to_numeric(data)[source]

Change data’s datetime type to integer with milliseconds unit.

Parameters

data (numpy.ndarray) – table data for converting.

Returns

The same table data with datetimes (if existed) converted to integer

Return type

numpy.ndarray