Data
- fedot.core.data.data.POSSIBLE_TABULAR_IDX_KEYWORDS = ['idx', 'index', 'id', 'unnamed: 0']
The list of keyword for auto-detecting csv tabular data index. Used in
Data.from_csv()
andMultiModalData.from_csv()
.
- fedot.core.data.data.POSSIBLE_TS_IDX_KEYWORDS = ['datetime', 'date', 'time', 'unnamed: 0']
The list of keyword for auto-detecting csv time-series data index. Used in
Data.from_csv_time_series()
,Data.from_csv_multi_time_series()
andMultiModalData.from_csv_time_series()
.
- class fedot.core.data.data.Data(idx, task, data_type, features, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>)[source]
Bases:
object
Base Data type class
- Parameters
idx (np.ndarray) –
task (Task) –
data_type (DataTypesEnum) –
features (Union[np.ndarray, pd.DataFrame]) –
categorical_features (Optional[np.ndarray]) –
categorical_idx (Optional[np.ndarray]) –
numerical_idx (Optional[np.ndarray]) –
encoded_idx (Optional[np.ndarray]) –
features_names (Optional[np.ndarray[str]]) –
target (Optional[np.ndarray]) –
supplementary_data (SupplementaryData) –
- Return type
None
- classmethod from_numpy(features_array, target_array, features_names=None, categorical_idx=None, idx=None, task='classification', data_type=DataTypesEnum.table)[source]
Import data from numpy array.
- Parameters
features_array (np.ndarray) – numpy array with features.
target_array (np.ndarray) – numpy array with target.
features_names (np.ndarray[str]) – numpy array with names of features
categorical_idx (Union[list[int, str], np.ndarray[int, str]]) – a list or numpy array with indexes or names of features (if provided feature_names) that indicate that the feature is categorical.
idx (Optional[np.ndarray]) – indices of arrays.
task (Union[Task, str]) – the
Task
to solve with the data.data_type (Optional[DataTypesEnum]) – the type of the data. Possible values are listed at
DataTypesEnum
.
- Returns
- InputData
representation of data in an internal data structure.
- Return type
data
- classmethod from_numpy_time_series(features_array, target_array=None, idx=None, task='ts_forecasting', data_type=DataTypesEnum.ts)[source]
Import time series from numpy array.
- Parameters
features_array (numpy.ndarray) – numpy array with features time series.
target_array (Optional[numpy.ndarray]) – numpy array with target time series (if None same as features).
idx (Optional[numpy.ndarray]) – indices of arrays.
task (Union[Task, str]) – the
Task
to solve with the data.data_type (Optional[DataTypesEnum]) – the type of the data. Possible values are listed at
DataTypesEnum
.
- Returns
- InputData
representation of data in an internal data structure.
- Return type
data
- classmethod from_dataframe(features_df, target_df, categorical_idx=None, task='classification', data_type=DataTypesEnum.table)[source]
Import data from pandas DataFrame.
- Parameters
features_df (Union[pd.DataFrame, pd.Series]) – loaded pandas DataFrame or Series with features.
target_df (Union[pd.DataFrame, pd.Series]) – loaded pandas DataFrame or Series with target.
categorical_idx (Union[list[int, str], np.ndarray[int, str]]) – a list or numpy array with indexes or names of features that indicate that the feature is categorical.
task (Union[Task, str]) – the
Task
to solve with the data.data_type (DataTypesEnum) – the type of the data. Possible values are listed at
DataTypesEnum
.
- Returns
- InputData
representation of data in an internal data structure.
- Return type
data
- classmethod from_csv(file_path, delimiter=',', task='classification', data_type=DataTypesEnum.table, columns_to_drop=None, target_columns='', categorical_idx=None, index_col=None, possible_idx_keywords=None)[source]
Import data from
csv
.- Parameters
file_path (PathType) – the path to the
CSV
with data.columns_to_drop (Optional[List[Union[str, int]]]) – the names of columns that should be dropped.
delimiter (str) – the delimiter to separate the columns.
task (Union[Task, str]) – the
Task
to solve with the data.data_type (DataTypesEnum) – the type of the data. Possible values are listed at
DataTypesEnum
.target_columns (Union[str, List[Union[str, int]], None]) – name of the target column (the last column if empty and no target if
None
).categorical_idx (Union[list[int, str], np.ndarray[int, str]]) – a list or numpy array with indexes or names of features that indicate that the feature is categorical.
index_col (Optional[Union[str, int]]) –
name or index of the column to use as the
Data.idx
.If
None
, then check the first column’s name and use it as index if succeeded (see the parampossible_idx_keywords
).Set
False
to skip the check and rearrange a new integer index.possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the
POSSIBLE_TABULAR_IDX_KEYWORDS
for the list of default keywords.
- Returns
data
- Return type
- classmethod from_csv_time_series(file_path, delimiter=',', task='ts_forecasting', is_predict=False, columns_to_drop=None, target_column='', index_col=None, possible_idx_keywords=None)[source]
Forms
InputData
ofts
type from columns of different variant of the same variable.- Parameters
file_path (Union[os.PathLike, str]) – path to the source csv file.
delimiter (str) – delimiter for pandas DataFrame.
task (Union[Task, str]) – the
Task
that should be solved with data.is_predict (bool) – indicator of stage to prepare the data to.
False
means fit,True
means predict.columns_to_drop (Optional[List]) –
list
with names of columns to ignore.target_column (Optional[str]) –
string
with name of target column, used for predict stage.index_col (Optional[Union[str, int]]) –
name or index of the column to use as the
Data.idx
.If
None
, then check the first column’s name and use it as index if succeeded (see the parampossible_idx_keywords
).Set
False
to skip the check and rearrange a new integer index.possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the
POSSIBLE_TS_IDX_KEYWORDS
for the list of default keywords.
- Returns
An instance of
InputData
.- Return type
- classmethod from_csv_multi_time_series(file_path, delimiter=',', task='ts_forecasting', is_predict=False, columns_to_use=None, target_column='', index_col=None, possible_idx_keywords=None)[source]
Forms
InputData
ofmulti_ts
type from columns of different variant of the same variable- Parameters
file_path (Union[os.PathLike, str]) – path to csv file.
delimiter (str) – delimiter for pandas df.
task (Union[Task, str]) – the
Task
that should be solved with data.is_predict (bool) – indicator of stage to prepare the data to.
False
means fit,True
means predict.columns_to_use (Optional[list]) –
list
with names of columns of different variant of the same variable.target_column (Optional[str]) –
string
with name of target column, used for predict stage.index_col (Optional[Union[str, int]]) –
name or index of the column to use as the
Data.idx
.If
None
, then check the first column’s name and use it as index if succeeded (see the parampossible_idx_keywords
).Set
False
to skip the check and rearrange a new integer index.possible_idx_keywords (Optional[List[str]]) – lowercase keys to find. If the first data column contains one of the keys, it is used as index. See the
POSSIBLE_TS_IDX_KEYWORDS
for the list of default keywords.
- Returns
An instance of
InputData
.- Return type
- static from_image(images=None, labels=None, task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), target_size=None)[source]
Input data from Image
- Parameters
images (Optional[Union[str, numpy.ndarray]]) – the path to the directory with image data in
np.ndarray
format or array innp.ndarray
formatlabels (Optional[Union[str, numpy.ndarray]]) – the path to the directory with image labels in
np.ndarray
format or array innp.ndarray
formattask (Task) – the
Task
that should be solved with datatarget_size (Optional[Tuple[int, int]]) – size for the images resizing (if necessary)
- Returns
An instance of
InputData
.- Return type
- static from_json_files(files_path, fields_to_use, label='label', task=Task(task_type=<TaskTypesEnum.classification: 'classification'>, task_params=None), data_type=DataTypesEnum.table, export_to_meta=False, is_multilabel=False, shuffle=True)[source]
Generates InputData from the set of
JSON
files with different fields- Parameters
files_path (str) – path the folder with
json
filesfields_to_use (List) –
list
of fields that will be considered as a featureslabel (str) – name of field with target variable
task (Task) –
Task
to solvedata_type (DataTypesEnum) – data type in fields (as well as type for obtained
InputData
)export_to_meta – combine extracted field and save to
CSV
is_multilabel – if
True
, creates multilabel targetshuffle – if
True
, shuffles data
- Returns
An instance of
InputData
.- Return type
- class fedot.core.data.data.InputData(idx, task, data_type, features, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>)[source]
Bases:
fedot.core.data.data.Data
Data class for input data for the nodes
- Parameters
idx (np.ndarray) –
task (Task) –
data_type (DataTypesEnum) –
features (Union[np.ndarray, pd.DataFrame]) –
categorical_features (Optional[np.ndarray]) –
categorical_idx (Optional[np.ndarray]) –
numerical_idx (Optional[np.ndarray]) –
encoded_idx (Optional[np.ndarray]) –
features_names (Optional[np.ndarray[str]]) –
target (Optional[np.ndarray]) –
supplementary_data (SupplementaryData) –
- Return type
None
- property num_classes: Optional[int]
Returns number of classes that are present in the target. NB: if some labels are not present in this data, then number of classes can be less than in the full dataset!
- property class_labels: Optional[int]
Returns unique class labels that are present in the target
- subset_indices(selected_idx)[source]
Get subset from
InputData
to extract all items with specified indices- Parameters
selected_idx (List) –
list
of indices for extraction- Returns
- subset_features(feature_ids)[source]
Return new
InputData
with subset of features based on non-emptyfeatures_ids
list or None otherwise- Parameters
feature_ids (numpy.array) –
- Return type
Optional[InputData]
- class fedot.core.data.data.OutputData(idx, task, data_type, features=None, categorical_features=None, categorical_idx=None, numerical_idx=None, encoded_idx=None, features_names=None, target=None, supplementary_data=<factory>, predict=None)[source]
Bases:
fedot.core.data.data.Data
Data
type for data prediction in the node- Parameters
idx (np.ndarray) –
task (Task) –
data_type (DataTypesEnum) –
features (Optional[Union[np.ndarray, pd.DataFrame]]) –
categorical_features (Optional[np.ndarray]) –
categorical_idx (Optional[np.ndarray]) –
numerical_idx (Optional[np.ndarray]) –
encoded_idx (Optional[np.ndarray]) –
features_names (Optional[np.ndarray[str]]) –
target (Optional[np.ndarray]) –
supplementary_data (SupplementaryData) –
predict (Optional[np.ndarray]) –
- Return type
None
- fedot.core.data.data._resize_image(file_path, target_size)[source]
Function resizes and rewrites the input image
- Parameters
file_path (str) –
target_size (Tuple[int, int]) –
- fedot.core.data.data.process_target_and_features(data_frame, target_column)[source]
Function process pandas
dataframe
with single column- Parameters
data_frame (pandas.core.frame.DataFrame) – loaded pandas
DataFrame
target_column (Optional[Union[str, List[str]]]) – names of columns with target or
None
- Returns
(
np.array
(table) with features,np.array
(column) with target)- Return type
Tuple[numpy.ndarray, Optional[numpy.ndarray]]