Synthetic dataset generator
- fedot.utilities.synth_dataset_generator.classification_dataset(samples_amount: int, features_amount: int, classes_amount: int, features_options: Dict, noise_fraction: float = 0.1, full_shuffle: bool = True, weights: Optional[list] = None)[source]
Generates a random dataset for
n-class
classification problem using scikit-learn API.- Parameters
samples_amount – Total amount of samples in the resulted dataset.
features_amount – Total amount of features per sample.
classes_amount – The amount of classes in the dataset.
features_options –
The dictionary containing features options in key-value format
possible
features_options
variants:informative
-> the amount of informative featuresredundant
-> the amount of redundant featuresrepeated
-> the amount of features that repeat the informative featuresclusters_per_class
-> the amount of clusters for each class
noise_fraction – the fraction of noisy labels in the dataset
full_shuffle – if true then all features and samples will be shuffled
weights – The proportions of samples assigned to each class. If None, then classes are balanced
- Returns
features and target as numpy-arrays
- Return type
array
- fedot.utilities.synth_dataset_generator.regression_dataset(samples_amount: int, features_amount: int, features_options: Dict, n_targets: int, noise: float = 0.0, shuffle: bool = True)[source]
Generates a random dataset for regression problem using scikit-learn API.
- Parameters
samples_amount – total amount of samples in the resulted dataset
features_amount – total amount of features per sample
features_options –
the dictionary containing features options in key-value format
possible
features_options
variants:informative
-> the amount of informative featuresbias
-> bias term in the underlying linear model
n_targets – the amount of target variables
noise – the standard deviation of the gaussian noise applied to the output
shuffle – if
True
then all features and samples will be shuffled
- Returns
features and target as numpy-arrays
- Return type
array
- fedot.utilities.synth_dataset_generator.gauss_quantiles_dataset(samples_amount: int, features_amount: int, classes_amount: int, full_shuffle=True, **kwargs)[source]
Generates a random dataset for n-class classification problem based on multi-dimensional gaussian distribution quantiles using scikit-learn API.
- Parameters
samples_amount – total amount of samples in the resulted dataset
features_amount – total amount of features per sample
classes_amount – the amount of classes in the dataset
full_shuffle – if
True
then all features and samples will be shuffledkwargs – Optional[‘gauss_params’] mean and covariance values of the distribution
- Returns
features and target as numpy-arrays
- Return type
array
- fedot.utilities.synth_dataset_generator.generate_synthetic_data(length: int = 2200, periods: int = 5)[source]
The function generates a synthetic one-dimensional array without omissions
- Parameters
length – the length of the array
periods – the number of periods in the sine wave
- Returns
an array without gaps
- Return type
array