Data

class fedot.core.data.data.Data(idx, features, task, data_type, target=None, supplementary_data=<factory>)

Bases: object

Base Data type class

Parameters
Return type

None

idx: np.array = None
features: np.array = None
task: Task = None
data_type: DataTypesEnum = None
target: Optional[numpy.array] = None
supplementary_data: SupplementaryData = None
static from_csv(file_path=None, delimiter=', ', task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), data_type=<DataTypesEnum.table: 'feature_table'>, columns_to_drop=None, target_columns='', index_col=0)
Parameters
  • file_path – the path to the CSV with data

  • columns_to_drop (Optional[List]) – the names of columns that should be dropped

  • delimiter – the delimiter to separate the columns

  • task (fedot.core.repository.tasks.Task) – the task that should be solved with data

  • data_type (fedot.core.repository.dataset_types.DataTypesEnum) – the type of data interpretation

  • target_columns (Union[str, List]) – name of target column (last column if empty and no target if None)

  • index_col (Union[str, int, None]) – column name or index to use as the Data.idx; if None then arrange new unique index

Returns

static from_csv_time_series(task, file_path=None, delimiter=', ', is_predict=False, target_column='')
Parameters
static from_csv_multi_time_series(task, file_path=None, delimiter=', ', is_predict=False, columns_to_use=None, target_column='')

Forms InputData of multi_ts type from columns of different variant of the same variable

Parameters
  • task (fedot.core.repository.tasks.Task) – the task that should be solved with data

  • file_path – path to csv file

  • delimiter – delimiter for pandas df

  • is_predict – is preparing for fit or predict stage

  • columns_to_use (Optional[list]) – list with names of columns of different variant of the same variable

  • target_column (Optional[str]) – string with name of target column, used for predict stage

static from_image(images=None, labels=None, task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), target_size=None)
Parameters
  • images (Union[str, numpy.ndarray]) – the path to the directory with image data in np.ndarray format or array in np.ndarray format

  • labels (Union[str, numpy.ndarray]) – the path to the directory with image labels in np.ndarray format or array in np.ndarray format

  • task (fedot.core.repository.tasks.Task) – the task that should be solved with data

  • target_size (Optional[Tuple[int, int]]) – size for the images resizing (if necessary)

Returns

static from_text_meta_file(meta_file_path=None, label='label', task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), data_type=<DataTypesEnum.text: 'text'>)
Parameters
static from_text_files(files_path, label='label', task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), data_type=<DataTypesEnum.text: 'text'>)
Parameters
static from_json_files(files_path, fields_to_use, label='label', task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), data_type=<DataTypesEnum.table: 'feature_table'>, export_to_meta=False, is_multilabel=False, shuffle=True)

Generates InputData from the set of JSON files with different fields :param files_path: path the folder with jsons :param fields_to_use: list of fields that will be considered as a features :param label: name of field with target variable :param task: task to solve :param data_type: data type in fields (as well as type for obtained InputData) :param export_to_meta: combine extracted field and save to CSV :param is_multilabel: if True, creates multilabel target :param shuffle: if True, shuffles data :return: combined dataset

Parameters
Return type

fedot.core.data.data.InputData

to_csv(path_to_save)
class fedot.core.data.data.InputData(idx, features, task, data_type, target=None, supplementary_data=<factory>)

Bases: fedot.core.data.data.Data

Data class for input data for the nodes

Parameters
Return type

None

property num_classes
subset_range(start, end)
Parameters
  • start (int) –

  • end (int) –

subset_indices(selected_idx)

Get subset from InputData to extract all items with specified indices :param selected_idx: list of indices for extraction :return:

Parameters

selected_idx (List) –

subset_features(features_ids)

Return new InputData with subset of features based on features_ids list

Parameters

features_ids (list) –

shuffle()

Shuffles features and target if possible

convert_non_int_indexes_for_fit(pipeline)

Conversion non int (datetime, string, etc) indexes in integer form in fit stage

convert_non_int_indexes_for_predict(pipeline)

Conversion non int (datetime, string, etc) indexes in integer form in predict stage

data_type = None
features = None
idx = None
supplementary_data = None
task = None
class fedot.core.data.data.OutputData(idx, features, task, data_type, target=None, supplementary_data=<factory>, predict=None)

Bases: fedot.core.data.data.Data

Data type for data prediction in the node

Parameters
Return type

None

predict: numpy.ndarray = None
target: Optional[numpy.ndarray] = None
fedot.core.data.data.process_target_and_features(data_frame, target_column)

Function process pandas dataframe with single column

Parameters
  • data_frame (pandas.core.frame.DataFrame) – loaded pandas DataFrame

  • target_column (Union[str, List[str], None]) – names of columns with target or None

Return features

numpy array (table) with features

Return target

numpy array (column) with target

Return type

Tuple[numpy.ndarray, Optional[numpy.ndarray]]

fedot.core.data.data.data_type_is_table(data)
Parameters

fedot.core.data.data.OutputData] data (Union[fedot.core.data.data.InputData,) –

Return type

bool

fedot.core.data.data.data_type_is_ts(data)
Parameters

data (fedot.core.data.data.InputData) –

Return type

bool

fedot.core.data.data.data_type_is_multi_ts(data)
Parameters

data (fedot.core.data.data.InputData) –

Return type

bool

fedot.core.data.data.get_indices_from_file(data_frame, file_path)
fedot.core.data.data.array_to_input_data(features_array, target_array, idx=None, task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None))
Parameters
fedot.core.data.data.autodetect_data_type(task)
Parameters

task (fedot.core.repository.tasks.Task) –

Return type

fedot.core.repository.dataset_types.DataTypesEnum