Synthetic dataset generator¶

fedot.utilities.synth_dataset_generator.
classification_dataset
(samples_amount, features_amount, classes_amount, features_options, noise_fraction=0.1, full_shuffle=True, weights=None)¶ Generates a random dataset for nclass classification problem using scikitlearn API.
 Parameters
samples_amount (int) – Total amount of samples in the resulted dataset.
features_amount (int) – Total amount of features per sample.
classes_amount (int) – The amount of classes in the dataset.
features_options (Dict) – The dictionary containing features options in keyvalue format:  informative: the amount of informative features;  redundant: the amount of redundant features;  repeated: the amount of features that repeat the informative features;  clusters_per_class: the amount of clusters for each class;
noise_fraction (float) – the fraction of noisy labels in the dataset;
full_shuffle (bool) – if true then all features and samples will be shuffled.
weights (list) – The proportions of samples assigned to each class. If None, then classes are balanced.
 Returns
features and target as numpyarrays.

fedot.utilities.synth_dataset_generator.
regression_dataset
(samples_amount, features_amount, features_options, n_targets, noise=0.0, shuffle=True)¶ Generates a random dataset for regression problem using scikitlearn API.
 Parameters
samples_amount (int) – Total amount of samples in the resulted dataset.
features_amount (int) – Total amount of features per sample.
features_options (Dict) – The dictionary containing features options in keyvalue format:  informative: the amount of informative features;  bias: bias term in the underlying linear model;
n_targets (int) – the amount of target variables;
noise (float) – the standard deviation of the gaussian noise applied to the output;
shuffle (bool) – if true then all features and samples will be shuffled.
 Returns
features and target as numpyarrays.

fedot.utilities.synth_dataset_generator.
gauss_quantiles_dataset
(samples_amount, features_amount, classes_amount, full_shuffle=True, **kwargs)¶ Generates a random dataset for nclass classification problem based on multidimensional gaussian distribution quantiles using scikitlearn API.
 Parameters
samples_amount (int) – Total amount of samples in the resulted dataset.
features_amount (int) – Total amount of features per sample.
classes_amount (int) – The amount of classes in the dataset.
full_shuffle – if true then all features and samples will be shuffled.
kwargs – Optional params:  ‘gauss_params’: mean and covariance values of the distribution.
 Returns
features and target as numpyarrays.

fedot.utilities.synth_dataset_generator.
generate_synthetic_data
(length=2200, periods=5)¶ The function generates a synthetic onedimensional array without omissions
 Parameters
length (int) – the length of the array
periods (int) – the number of periods in the sine wave
 Return synthetic_data
an array without gaps