Time Series Forecasting
FEDOT allows you to automate machine learning pipeline design for timeseries forecasting.
FEDOT extracts features using lagged transformation to apply regression methods for forecasting. Therefore not only models specific for timeseries forecasting (such as ARIMA and AR) can be used but also any regression method (ridge, lasso, etc.).
Timeseries specific preprocessing methods, like moving average smoothing or Gaussian smoothing are used as well.
Simple examples
You can find all available API parameters in Fedot
reference.
Automated
Use FEDOT in automated mode to get pipeline with automatically composed architecture and tuned hyperparameters.
from fedot.api.main import Fedot
from fedot.core.data.data import InputData
from fedot.core.data.data_split import train_test_data_setup
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
# specify the task and the forecast length (required depth of forecast)
task = Task(TaskTypesEnum.ts_forecasting,
TsForecastingParams(forecast_length=10))
# load data from csv
train_input = InputData.from_csv_time_series(task=task,
file_path='time_series.csv',
delimiter=',',
target_column='value')
# split data for train and test
train_data, test_data = train_test_data_setup(train_input)
# init model for the timeseries forecasting
model = Fedot(problem='ts_forecasting', task_params=task.task_params)
# run AutoML model design
pipeline = model.fit(train_data)
# plot obtained pipeline
pipeline.show()
# use model to obtain outofsample forecast with one step
forecast = model.forecast(test_data)
print(model.get_metrics(metric_names=['rmse', 'mae', 'mape'], target=test_data.target))
# plot forecasting result
model.plot_prediction()
Sample output:
Pipeline plot from the example:
Here:
polyfit  polynomial interpolation,
lagged  lagged transformation,
ridge  ridge regression.
In the first branch polynomial interpolation is applied to obtain forecast. In the second branch lagged transformation is used to transform timeseries into table data and then ridge is applied. Finally, another ridge model uses forecasts of two branches to generate final prediction.
Obtained metrics:
{'rmse': 8.485, 'mae': 6.904, 'mape': 0.049}
Plot of the forecast:
Manual
Use FEDOT in manual mode to fit your own pipeline for timeseries forecasting.
Examples of timeseries pipelines can be found here.
See how to tune pipeline hyperparameters in Tuning of Hyperparameters.
from fedot.api.main import Fedot
from fedot.core.data.data import InputData
from fedot.core.data.data_split import train_test_data_setup
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
# compose your own pipeline
pipeline = PipelineBuilder() \
.add_sequence('lagged', 'ridge', branch_idx=0) \
.add_sequence('glm', branch_idx=1).join_branches('ridge').build()
# specify the task and the forecast length (required depth of forecast)
task = Task(TaskTypesEnum.ts_forecasting,
TsForecastingParams(forecast_length=10))
# load data from csv
train_input = InputData.from_csv_time_series(task=task,
file_path='time_series.csv',
delimiter=',',
target_column='value')
# split data for train and test
train_data, test_data = train_test_data_setup(train_input)
# init model for the timeseries forecasting
model = Fedot(problem='ts_forecasting', task_params=task.task_params)
# fit pipeline
model.fit(train_data, predefined_model=pipeline)
# use model to obtain onestep outofsample forecast
forecast = model.forecast(test_data)
print(model.get_metrics(metric_names=['rmse', 'mae', 'mape'], target=test_data.target))
# plot forecasting result
model.plot_prediction()
Sample output:
Pipeline plot from the example:
Here:
glm  generalized linear model,
lagged  lagged transformation,
ridge  ridge regression.
In the first branch lagged transformation is used to transform timeseries into table data and then ridge is applied. In the second branch generalized linear model is applied to obtain forecast. Finally, another ridge model uses forecasts of two branches to generate final prediction.
Obtained metric:
{'rmse': 2.659, 'mae': 2.142, 'mape': 0.100}
Plot of the forecast:
Timeseries validation
FEDOT uses insample forecasting for timeseries validation.
While using FEDOT for forecasting you can set cv_folds
parameter that sets number
of folds for cross validation of a timeseries.
The size of validation sample in each fold is almost the maximum possible size and multiple of forecast_length
.
Train test split
To split InputData use train_test_data_setup
method.
split_ratio
and shuffle
, and stratify
are ignored for timeseries forecasting.
 fedot.core.data.data_split.train_test_data_setup(data, split_ratio=0.8, shuffle=False, shuffle_flag=False, stratify=True, random_seed=42, validation_blocks=None)[source]
Function for train and test split for both InputData and MultiModalData
 Parameters
data (Union[InputData, fedot.core.data.multi_modal.MultiModalData]) – InputData object to split
split_ratio (float) – share of train data between 0 and 1
shuffle (bool) – is data needed to be shuffled or not
shuffle_flag (bool) – same is shuffle, use for backward compatibility
stratify (bool) – make stratified sample or not
random_seed (int) – random_seed for shuffle
validation_blocks (Optional[int]) – validation blocks are used for test
 Returns
data for train, data for validation
 Return type
Tuple[Union[InputData, fedot.core.data.multi_modal.MultiModalData], Union[InputData, fedot.core.data.multi_modal.MultiModalData]]
The method uses forecast_length
specified in the data.task
.
train_data, test_data = train_test_data_setup(train_input)
This approach can be used to
obtain `test_data`
for outofsample forecast after training a model on the `train_data`
.
The resulting split:
train_data.features = data.features[:forecast_length]
train_data.target = data.target[:forecast_length]
test_data.features = data.features[:forecast_length]
train_data.target = data.target[forecast_length:]
train_data, test_data = train_test_data_setup(train_input, validation_blocks=3)
If you pass keyword argument validation_blocks
train data will be prepared for insample
validation with validation_blocks
number of steps. In these case:
train_data.features = data.features[:forecast_length * validation_blocks]
train_data.target = data.target[:forecast_length * validation_blocks]
test_data.features = data.features
train_data.target = data.target[forecast_length * validation_blocks:]
Prediction
There are two approaches to timeseries forecasting: insample and outofsample. For example, our trained model forecasts 10 values ahead and our training sample length is 100. With insample forecast we will predict 10 last values of our training sample using 90 first values as prehistory. With outofsample we will predict 10 future values of the training sample using the whole sample of 100 values as prehistory.
To obtain forecast with length exceeding the forecast length (depth) model was trained for, we use iterative prediction. For example, our trained model forecasts 10 values ahead, but we want to forecast 20 values. With outofsample approach we would predict 10 values and then use those values to forecast another 10 values. But with insample approach we forecast already known parts of timeseries. And after forecasting first 10 values we would use real values from timeseries to forecast next 10 values.
You can use two methods for timeseries forecasting: Fedot.predict
 insample forecast, Fedot.forecast

outofsample forecast.
Insample forecast
Fedot.predict
allows you to obtain iterative insample forecast with depth of forecast_length * validation_blocks
.
Method uses forecast_length
specified in the task parameters. This method uses features
as sample and gets forecast in the way
described in the picture (validation_blocks=3
).
Example of insample forecast.
from fedot.api.main import Fedot
from fedot.core.data.data import InputData
from fedot.core.data.data_split import train_test_data_setup
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
# set task type and forecast length
task = Task(TaskTypesEnum.ts_forecasting,
TsForecastingParams(forecast_length=10))
# load data from csv
train_input = InputData.from_csv_time_series(file_path='time_series.csv',
task=task,
target_column='value')
# split data for insample forecast
train_data, test_data = train_test_data_setup(train_input, validation_blocks=3)
# init model for the timeseries forecasting
model = Fedot(problem='ts_forecasting', task_params=task.task_params, cv_folds=2)
# run AutoML model design
pipeline = model.fit(train_data)
# use model to obtain threestep insample forecast
in_sample_forecast = model.predict(test_data, validation_blocks=3)
# get metrics for the prediction
print('Metrics for threestep insample forecast: ',
model.get_metrics(metric_names=['rmse', 'mae', 'mape'], validation_blocks=3))
# plot forecasting result
model.plot_prediction()
Plot of the forecast:
Another example of insample forecast.
...
# split data for insample forecast
train_data, test_data = train_test_data_setup(train_input, validation_blocks=2)
# init model for the timeseries forecasting
model = Fedot(problem='ts_forecasting', task_params=task.task_params,
cv_folds=2)
# run AutoML model design
pipeline = model.fit(train_data)
# use model to obtain twostep insample forecast while using 3 validation blocks for fit
in_sample_forecast = model.predict(test_data, validation_blocks=2)
# get metrics for the prediction
print('Metrics for twostep insample forecast: ',
model.get_metrics(metric_names=['rmse', 'mae', 'mape'], validation_blocks=2))
# plot forecasting result
model.plot_prediction()
Plot of the forecast:
Outofsample forecast
Fedot.forecast
can be used to obtain outofsample forecast with custom forecast horizon. If
horizon > forecast_length
forecast is obtained iteratively using previously forecasted values to
predict next ones at each step. If horizon < forecast_length
forecast is cutted according to the horison
.
By default horizon = forecast_length
.
Example of forecast with default horizon.
from fedot.api.main import Fedot
from fedot.core.data.data import InputData
from fedot.core.data.data_split import train_test_data_setup
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
# set task type and forecast length
task = Task(TaskTypesEnum.ts_forecasting,
TsForecastingParams(forecast_length=10))
# load data from csv
train_input = InputData.from_csv_time_series(file_path='time_series.csv',
task=task,
target_column='value')
# split data for outofsample forecast
train_data, test_data = train_test_data_setup(train_input)
# init model for the timeseries forecasting
model = Fedot(problem='ts_forecasting', task_params=task.task_params, cv_folds=2)
# run AutoML model design
pipeline = model.fit(train_data)
# use model to obtain onestep outofsample forecast
out_of_sample_forecast = model.forecast(test_data)
# get metrics for the prediction
print('Metrics for outofsample forecast: ',
model.get_metrics(metric_names=['rmse', 'mae', 'mape']))
# plot forecasting result
model.plot_prediction()
Plot of the forecast:
Example of forecast with horizon > forecast_length
.
...
# split data for outofsample forecast
train_data, test_data = train_test_data_setup(train_input)
# init model for the timeseries forecasting
model = Fedot(problem='ts_forecasting', task_params=task.task_params, cv_folds=2)
# run AutoML model design
pipeline = model.fit(train_data)
# use model to obtain outofsample forecast with horizon = 25 (> forecast_length)
in_sample_forecast = model.forecast(test_data, horizon=25)
# plot forecasting result
model.plot_prediction()
Plot of the forecast:
Multivariate timeseries forecasting
FEDOT allows you to forecast multivariate timeseries.
Use multi timeseries approach to forecast one timeseries, while model was trained on the number of timeseries of the same variable as target. At first, lagged transformation is applied to transform all of the timeseries to table data then regression models are used. (Using several timeseries allows you to expand training sample  after lagged transformation you get more train/target pairs).
from fedot.api.main import Fedot
from fedot.core.data.data import InputData
from fedot.core.data.data_split import train_test_data_setup
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
# name of the column with target timeseries
target = 'col_3'
# set task type and forecast length
task = Task(TaskTypesEnum.ts_forecasting,
TsForecastingParams(forecast_length=20)) # forecast_length  required depth of forecast
# load data from csv
data = InputData.from_csv_multi_time_series(
file_path='time_series.csv',
task=task,
target_column=target,
columns_to_use=['col_1', 'col_2', 'col_3', ..., 'col_n'])
# split data for insample forecast
train_data, test_data = train_test_data_setup(data, validation_blocks=2)
# init model for the timeseries forecasting
model = Fedot(problem='ts_forecasting',
task_params=task.task_params,
timeout=5,
n_jobs=1,
cv_folds=2)
# run AutoML model design
pipeline = model.fit(train_data)
# plot obtained pipeline
pipeline.show()
# use model to obtain twostep insample forecast
forecast = model.predict(test_data, validation_blocks=2)
print(model.get_metrics(metric_names=['rmse', 'mae', 'mape'],
target=test_data.target, validation_blocks=2))
Sample output:
Pipeline plot from the example:
Here:
smoothing  rolling mean,
lagged  lagged transformation,
ridge  ridge regression.
In the first branch timeseries is transformed using rolling mean, lagged transformation is used to transform timeseries into table data and then ridge is applied. In the second branch only lagged transformation is applied before using ridge. Finally, another ridge model uses forecasts of two branches to generate final prediction.
Obtained metric:
{'rmse': 0.105, 'mae': 0.087, 'mape': 0.469}
Plot of the forecast:
Use multimodal approach to forecast a timeseries using several other timeseries. In this case for every timeseries a data sourse node will be created.
import numpy as np
import pandas as pd
from fedot.api.main import Fedot
from fedot.core.repository.tasks import TsForecastingParams
from fedot.core.utils import fedot_project_root
forecast_length = 10
# load data from csv
df_train = pd.read_csv('data_train.csv'))
# get timeseries
ws_history = np.ravel(np.array(df_train['wind_speed']))[:200]
ssh_history = np.ravel(np.array(df_train['sea_height']))[:200]
# use dict as input data
historical_data = {
'ws': ws_history, # additional variable
'ssh': ssh_history, # target variable
}
# init model for the timeseries forecasting
fedot = Fedot(problem='ts_forecasting',
task_params=TsForecastingParams(forecast_length=forecast_length),
timeout=10)
# run AutoML model design
pipeline = fedot.fit(features=historical_data,
target=ssh_history) # specify target timeseries
# get insample forecast
fedot.predict(historical_data)
# get metrics
metric = fedot.get_metrics(ssh_history[forecast_length:])
pipeline.show()
fedot.plot_prediction(target='ssh')
Sample output:
Pipeline plot from the example:
Here:
glm  generalized linear model,
lagged  lagged transformation,
ridge  ridge regression,
data_source_ts/ws  data input node for ws timeseries,
data_source_ts/ssh  data input node for ssh timeseries
In the first branch timeseries is transformed using lagged transformation and then ridge is applied. In the second branch only generalized linear model is applied. Finally, another ridge model uses forecasts of two branches to generate final prediction.
Obtained metric:
{'rmse': 11.578, 'mae': 10.400}
Plot of the forecast:
Lagged transformation
To extract features FEDOT uses lagged transformation (windowing method) which allows to represent timeseries as trajectory matrix and apply regression methods for forecasting.
Examples
Simple
Advanced
Cases