Synthetic dataset generator

fedot.utilities.synth_dataset_generator.classification_dataset(samples_amount, features_amount, classes_amount, features_options, noise_fraction=0.1, full_shuffle=True, weights=None)[source]

Generates a random dataset for n-class classification problem using scikit-learn API.

Parameters

samples_amount (int) – Total amount of samples in the resulted dataset.
features_amount (int) – Total amount of features per sample.
classes_amount (int) – The amount of classes in the dataset.
features_options (Dict) –
The dictionary containing features options in key-value format
possible features_options variants:
- informative -> the amount of informative features
- redundant -> the amount of redundant features
- repeated -> the amount of features that repeat the informative features
- clusters_per_class -> the amount of clusters for each class
noise_fraction (float) – the fraction of noisy labels in the dataset
full_shuffle (bool) – if true then all features and samples will be shuffled
weights (Optional[list]) – The proportions of samples assigned to each class. If None, then classes are balanced

Returns

features and target as numpy-arrays

Return type

array

fedot.utilities.synth_dataset_generator.regression_dataset(samples_amount, features_amount, features_options, n_targets, noise=0.0, shuffle=True)[source]

Generates a random dataset for regression problem using scikit-learn API.

Parameters

samples_amount (int) – total amount of samples in the resulted dataset
features_amount (int) – total amount of features per sample
features_options (Dict) –
the dictionary containing features options in key-value format
possible features_options variants:
- informative -> the amount of informative features
- bias -> bias term in the underlying linear model
n_targets (int) – the amount of target variables
noise (float) – the standard deviation of the gaussian noise applied to the output
shuffle (bool) – if True then all features and samples will be shuffled

Returns

features and target as numpy-arrays

Return type

array

fedot.utilities.synth_dataset_generator.gauss_quantiles_dataset(samples_amount, features_amount, classes_amount, full_shuffle=True, **kwargs)[source]

Generates a random dataset for n-class classification problem based on multi-dimensional gaussian distribution quantiles using scikit-learn API.

Parameters

samples_amount (int) – total amount of samples in the resulted dataset
features_amount (int) – total amount of features per sample
classes_amount (int) – the amount of classes in the dataset
full_shuffle – if True then all features and samples will be shuffled
kwargs – Optional[‘gauss_params’] mean and covariance values of the distribution

Returns

features and target as numpy-arrays

Return type

array

fedot.utilities.synth_dataset_generator.generate_synthetic_data(length=2200, periods=5)[source]

The function generates a synthetic one-dimensional array without omissions

Parameters

length (int) – the length of the array
periods (int) – the number of periods in the sine wave

Returns

an array without gaps

Return type

array