Synthetic dataset generator

fedot.utilities.synth_dataset_generator.classification_dataset(samples_amount, features_amount, classes_amount, features_options, noise_fraction=0.1, full_shuffle=True, weights=None)

Generates a random dataset for n-class classification problem using scikit-learn API.

Parameters
  • samples_amount (int) – Total amount of samples in the resulted dataset.

  • features_amount (int) – Total amount of features per sample.

  • classes_amount (int) – The amount of classes in the dataset.

  • features_options (Dict) – The dictionary containing features options in key-value format: - informative: the amount of informative features; - redundant: the amount of redundant features; - repeated: the amount of features that repeat the informative features; - clusters_per_class: the amount of clusters for each class;

  • noise_fraction (float) – the fraction of noisy labels in the dataset;

  • full_shuffle (bool) – if true then all features and samples will be shuffled.

  • weights (list) – The proportions of samples assigned to each class. If None, then classes are balanced.

Returns

features and target as numpy-arrays.

fedot.utilities.synth_dataset_generator.regression_dataset(samples_amount, features_amount, features_options, n_targets, noise=0.0, shuffle=True)

Generates a random dataset for regression problem using scikit-learn API.

Parameters
  • samples_amount (int) – Total amount of samples in the resulted dataset.

  • features_amount (int) – Total amount of features per sample.

  • features_options (Dict) – The dictionary containing features options in key-value format: - informative: the amount of informative features; - bias: bias term in the underlying linear model;

  • n_targets (int) – the amount of target variables;

  • noise (float) – the standard deviation of the gaussian noise applied to the output;

  • shuffle (bool) – if true then all features and samples will be shuffled.

Returns

features and target as numpy-arrays.

fedot.utilities.synth_dataset_generator.gauss_quantiles_dataset(samples_amount, features_amount, classes_amount, full_shuffle=True, **kwargs)

Generates a random dataset for n-class classification problem based on multi-dimensional gaussian distribution quantiles using scikit-learn API.

Parameters
  • samples_amount (int) – Total amount of samples in the resulted dataset.

  • features_amount (int) – Total amount of features per sample.

  • classes_amount (int) – The amount of classes in the dataset.

  • full_shuffle – if true then all features and samples will be shuffled.

  • kwargs – Optional params: - ‘gauss_params’: mean and covariance values of the distribution.

Returns

features and target as numpy-arrays.

fedot.utilities.synth_dataset_generator.generate_synthetic_data(length=2200, periods=5)

The function generates a synthetic one-dimensional array without omissions

Parameters
  • length (int) – the length of the array

  • periods (int) – the number of periods in the sine wave

Return synthetic_data

an array without gaps