Data

fedot.core.data.data.POSSIBLE_TABULAR_IDX_KEYWORDS = ['idx', 'index', 'id', 'unnamed: 0']: The list of keyword for auto-detecting csv tabular data index. Used in Data.from_csv() and MultiModalData.from_csv().

fedot.core.data.data.POSSIBLE_TS_IDX_KEYWORDS = ['datetime', 'date', 'time', 'unnamed: 0']: The list of keyword for auto-detecting csv time-series data index. Used in Data.from_csv_time_series(), Data.from_csv_multi_time_series() and MultiModalData.from_csv_time_series().

class fedot.core.data.data.Data(idx, task, data_type, features, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>)[source]

Bases: object

Base Data type class

Parameters

idx (np.ndarray) –
task (Task) –
data_type (DataTypesEnum) –
features (np.ndarray) –
categorical_features (Optional[np.ndarray]) –
categorical_idx (Optional[np.ndarray]) –
numerical_idx (Optional[np.ndarray]) –
encoded_idx (Optional[np.ndarray]) –
features_names (Optional[np.ndarray[str]]) –
target (Optional[np.ndarray]) –
supplementary_data (SupplementaryData) –

Return type

None

classmethod from_numpy(features_array, target_array, idx=None, task='classification', data_type=DataTypesEnum.table)[source]

Import data from numpy array.

Parameters

features_array (numpy.ndarray) – numpy array with features.
target_array (numpy.ndarray) – numpy array with target.
idx (Optional[numpy.ndarray]) – indices of arrays.
task (Union[Task, str]) – the Task to solve with the data.
data_type (Optional[DataTypesEnum]) – the type of the data. Possible values are listed at DataTypesEnum.

Returns

data

Return type

InputData

classmethod from_numpy_time_series(features_array, target_array=None, idx=None, task='ts_forecasting', data_type=DataTypesEnum.ts)[source]

Import time series from numpy array.

Parameters

features_array (numpy.ndarray) – numpy array with features time series.
target_array (Optional[numpy.ndarray]) – numpy array with target time series (if None same as features).
idx (Optional[numpy.ndarray]) – indices of arrays.
task (Union[Task, str]) – the Task to solve with the data.
data_type (Optional[DataTypesEnum]) – the type of the data. Possible values are listed at DataTypesEnum.

Returns

data

Return type

InputData

classmethod from_dataframe(features_df, target_df, task='classification', data_type=DataTypesEnum.table)[source]

Import data from pandas DataFrame.

Parameters

features_df (Union[pandas.core.frame.DataFrame, pandas.core.series.Series]) – loaded pandas DataFrame or Series with features.
target_df (Union[pandas.core.frame.DataFrame, pandas.core.series.Series]) – loaded pandas DataFrame or Series with target.
task (Union[Task, str]) – the Task to solve with the data.
data_type (DataTypesEnum) – the type of the data. Possible values are listed at DataTypesEnum.

Returns

data

Return type

InputData

classmethod from_csv(file_path, delimiter=',', task='classification', data_type=DataTypesEnum.table, columns_to_drop=None, target_columns='', index_col=None, possible_idx_keywords=None)[source]

Import data from csv.

Parameters

file_path (Union[os.PathLike, str]) – the path to the CSV with data.
columns_to_drop (Optional[List[Union[str, int]]]) – the names of columns that should be dropped.
delimiter (str) – the delimiter to separate the columns.
task (Union[Task, str]) – the Task to solve with the data.
data_type (DataTypesEnum) – the type of the data. Possible values are listed at DataTypesEnum.
target_columns (Union[str, List[Union[str, int]]]) – name of the target column (the last column if empty and no target if None).
index_col (Optional[Union[str, int]]) –
name or index of the column to use as the Data.idx.

If None, then check the first column’s name and use it as index if succeeded (see the param possible_idx_keywords).

Set False to skip the check and rearrange a new integer index.
possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the POSSIBLE_TABULAR_IDX_KEYWORDS for the list of default keywords.

Returns

data

Return type

InputData

classmethod from_csv_time_series(file_path, delimiter=',', task='ts_forecasting', is_predict=False, columns_to_drop=None, target_column='', index_col=None, possible_idx_keywords=None)[source]

Forms InputData of ts type from columns of different variant of the same variable.

Parameters

file_path (Union[os.PathLike, str]) – path to the source csv file.
delimiter (str) – delimiter for pandas DataFrame.
task (Union[Task, str]) – the Task that should be solved with data.
is_predict (bool) – indicator of stage to prepare the data to. False means fit, True means predict.
columns_to_drop (Optional[List]) – list with names of columns to ignore.
target_column (Optional[str]) – string with name of target column, used for predict stage.
index_col (Optional[Union[str, int]]) –
name or index of the column to use as the Data.idx.

If None, then check the first column’s name and use it as index if succeeded (see the param possible_idx_keywords).

Set False to skip the check and rearrange a new integer index.
possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the POSSIBLE_TS_IDX_KEYWORDS for the list of default keywords.

Returns

An instance of InputData.

Return type

InputData

classmethod from_csv_multi_time_series(file_path, delimiter=',', task='ts_forecasting', is_predict=False, columns_to_use=None, target_column='', index_col=None, possible_idx_keywords=None)[source]

Forms InputData of multi_ts type from columns of different variant of the same variable

Parameters

file_path (Union[os.PathLike, str]) – path to csv file.
delimiter (str) – delimiter for pandas df.
task (Union[Task, str]) – the Task that should be solved with data.
is_predict (bool) – indicator of stage to prepare the data to. False means fit, True means predict.
columns_to_use (Optional[list]) – list with names of columns of different variant of the same variable.
target_column (Optional[str]) – string with name of target column, used for predict stage.
index_col (Optional[Union[str, int]]) –
name or index of the column to use as the Data.idx.

If None, then check the first column’s name and use it as index if succeeded (see the param possible_idx_keywords).

Set False to skip the check and rearrange a new integer index.
possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the POSSIBLE_TS_IDX_KEYWORDS for the list of default keywords.

Returns

An instance of InputData.

Return type

InputData

static from_image(images=None, labels=None, task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), target_size=None)[source]

Input data from Image

Parameters

images (Optional[Union[str, numpy.ndarray]]) – the path to the directory with image data in np.ndarray format or array in np.ndarray format
labels (Optional[Union[str, numpy.ndarray]]) – the path to the directory with image labels in np.ndarray format or array in np.ndarray format
task (Task) – the Task that should be solved with data
target_size (Optional[Tuple[int, int]]) – size for the images resizing (if necessary)

Returns

An instance of InputData.

Return type

InputData

static from_json_files(files_path, fields_to_use, label='label', task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), data_type=DataTypesEnum.table, export_to_meta=False, is_multilabel=False, shuffle=True)[source]

Generates InputData from the set of JSON files with different fields

Parameters

files_path (str) – path the folder with json files
fields_to_use (List) – list of fields that will be considered as a features
label (str) – name of field with target variable
task (Task) – Task to solve
data_type (DataTypesEnum) – data type in fields (as well as type for obtained InputData)
export_to_meta – combine extracted field and save to CSV
is_multilabel – if True, creates multilabel target
shuffle – if True, shuffles data

Returns

An instance of InputData.

Return type

InputData

class fedot.core.data.data.InputData(idx, task, data_type, features, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>)[source]

Bases: fedot.core.data.data.Data

Data class for input data for the nodes

Parameters

idx (np.ndarray) –
task (Task) –
data_type (DataTypesEnum) –
features (np.ndarray) –
categorical_features (Optional[np.ndarray]) –
categorical_idx (Optional[np.ndarray]) –
numerical_idx (Optional[np.ndarray]) –
encoded_idx (Optional[np.ndarray]) –
features_names (Optional[np.ndarray[str]]) –
target (Optional[np.ndarray]) –
supplementary_data (SupplementaryData) –

Return type

None

property num_classes: Optional[int]: Returns number of classes that are present in the target. NB: if some labels are not present in this data, then number of classes can be less than in the full dataset!

property class_labels: Optional[int]: Returns unique class labels that are present in the target

subset_indices(selected_idx)[source]

Get subset from InputData to extract all items with specified indices

Parameters: selected_idx (List) – list of indices for extraction
Returns: InputData

subset_features(feature_ids)[source]

Return new InputData with subset of features based on non-empty features_ids list or None otherwise

Parameters: feature_ids (list) –
Return type: Optional[InputData]

shuffle()[source]: Shuffles features and target if possible

convert_non_int_indexes_for_fit(pipeline)[source]: Conversion non int (datetime, string, etc) indexes in integer form on the fit stage

convert_non_int_indexes_for_predict(pipeline)[source]: Conversion non int (datetime, string, etc) indexes in integer form on the predict stage

class fedot.core.data.data.OutputData(idx, task, data_type, features=None, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>, predict=None)[source]

Bases: fedot.core.data.data.Data

Data type for data prediction in the node

Parameters

idx (np.ndarray) –
task (Task) –
data_type (DataTypesEnum) –
features (Optional[np.ndarray]) –
categorical_features (Optional[np.ndarray]) –
categorical_idx (Optional[np.ndarray]) –
numerical_idx (Optional[np.ndarray]) –
encoded_idx (Optional[np.ndarray]) –
features_names (Optional[np.ndarray[str]]) –
target (Optional[np.ndarray]) –
supplementary_data (SupplementaryData) –
predict (Optional[np.ndarray]) –

Return type

None

fedot.core.data.data._resize_image(file_path, target_size)[source]

Function resizes and rewrites the input image

Parameters

file_path (str) –
target_size (Tuple[int, int]) –

fedot.core.data.data.process_target_and_features(data_frame, target_column)[source]

Function process pandas dataframe with single column

Parameters

data_frame (pandas.core.frame.DataFrame) – loaded pandas DataFrame
target_column (Optional[Union[str, List[str]]]) – names of columns with target or None

Returns

(np.array (table) with features, np.array (column) with target)

Return type

Tuple[numpy.ndarray, Optional[numpy.ndarray]]

fedot.core.data.data.np_datetime_to_numeric(data)[source]

Change data’s datetime type to integer with milliseconds unit.

Parameters: data (numpy.ndarray) – table data for converting.
Returns: The same table data with datetimes (if existed) converted to integer
Return type: numpy.ndarray