Feature engineering for forecasting
Find out what you will learn throughout the course (if the video does not show, try allowing cookies in your browser).
What you'll learn
👉 How to forecast with traditional machine learning models.
👉 How to convert time series into tabular data with predictors and target(s).
👉 How to predict multiple steps into the future with direct and recursive forecasting.
👉 How to impute missing data in time series.
👉 How to extract features from past data through lags and windows.
👉 How to identify and handle outliers.
👉 How to encode categorical variables.
👉 How to extract features from date and time.
👉 How to capture trend and seasonality into features.
What you'll get
Lifetime access
Instructor support
Certificate of completion
💬 English subtitles
Can't afford it? Get in touch.
30 days money back guarantee
If you're disappointed for whatever reason, you'll get a full refund.
So you can buy with confidence.
Instructors
Soledad Galli, PhD
Sole is a lead data scientist, instructor and developer of open source software. She created and maintains the Python library Feature-engine, which allows us to impute data, encode categorical variables, transform, create and select features. Sole is also the author of the "Python Feature engineering Cookbook" by Packt editorial.
Kishan Manani, PhD
Kishan is a machine learning and data science lead, course instructor, and open source software contributor. He has contributed to well known Python packages including Statsmodels, Feature-engine, and Prophet. He presents at data science conferences including ODSC and PyData. Kishan attained a PhD in Physics from Imperial College London in applied large scale time-series analysis and modelling of cardiac arrhythmias; during this time he taught and supervised undergraduates and master's students.
Course description
Welcome to the most comprehensive course on Feature Engineering for Time Series Forecasting available online. In this course, you will learn how to create and extract features from time series data for use in forecasting.
Master the Art of Feature Engineering for Time Series Forecasting
In this course, you will learn multiple feature engineering methods to create features from time series data that are suitable for forecasting with off-the-shelf regression models like linear regression, tree-based models, and even neural networks.
Specifically, you will learn:
- how to create lag features;
- how to create window features;
- how to create features that capture seasonality and trends;
- how to decompose time series with multiple seasonalities;
- how to extract features from the date and time;
- how to impute missing data in time series;
- how to encode categorical variables in time series;
- how to identify and remove outliers in time series;
- how to avoid data leakage and look-ahead bias in creating forecasting features;
- how to transform features and more.
The Challenges of Feature Engineering in Time Series Forecasting
Forecasting is the process of making predictions about the future based on past data. In the most traditional scenario, we have a time series and want to predict its future values. There are some challenges in creating forecasting features:
- we need to transform time series data into tabular data with a well-designed set of features and a target variable;
- when creating forecasting features we need to be extra careful to avoid data leakage via look-ahead bias;
- time series data, as expected, changes over time; we need to take this into account when building forecasting features;
- predicting the target value at multiple timesteps in the future requires us to think carefully about how to extrapolate our features from the past into the future.
We can forecast future values of the time series using off-the-shelf regression models like linear regression, tree-based models, support vector machines, and more. However, these models require tabular data as input. For forecasting we don’t start with a table of features and a target variable, but instead a set of time series, perhaps just one. We need to transform the time series into tabular data with a target variable and a set of features that can be used by supervised learning models. Therefore, the main challenge is about creating a well-designed target variable and specially designed features that allow us to predict the future value of a time series.
Creating the target variable and features for time series forecasting comes with its own pitfalls. A major concern is a form of data leakage known as look-ahead bias. This is where you accidentally use information that is only known in the future, not at predict time, to make a prediction. This can give you the illusion that you have a great forecasting model, however, in practice it will not perform. It is very easy to introduce look-ahead bias during feature engineering and we show how you can avoid it.
Time series data change over time, that is, future data may or may not have the same distribution and patterns that we have in past data, this is different from the assumptions made about traditional tabular data. This change in distribution and patterns over time is called non-stationarity. In time series data, the simple presence of trend and seasonality can cause non-stationarity. Creating features that capture this dynamic is thus a challenge in time series forecasting.
We very often want to forecast multiple timesteps into the future. There are multiple ways to do this, such as 1) recursively applying a model that is built to forecast one step ahead, and 2) building a model that directly forecasts the target at a later time period in the future. A challenge is that the feature engineering required for these two methods are different. We discuss these differences in the course.
How can we create a set of features that allow us to predict future values of a time series based on its past values? And how can we add additional information to create a richer dataset for our forecasts? In this course, you will learn all of that, and more.
A Comprehensive Feature Engineering Course for Time Series Forecasting
Creating useful features for forecasting has typically required carefully studying your time series to find predictive patterns, such as trend and seasonality, and integrating this with domain knowledge. Lately, there’s been a growing trend to try to automate the creation of features from time series.
In this course, you will learn how to create features from time series that allow you to train off-the-shelf machine learning models to predict future values of the time series. You will first learn to analyse time series and identify properties that you can use to create predictive features. For example, you will learn how to automatically identify and extract trend and seasonality using various algorithms, as well as how to transform your time series to make it easier to decompose and forecast. We show how you can use tools such as cross-correlation, autocorrelation, and partial autocorrelation plots to create suitable lag features. You will discover tips, tricks, and hacks to create features which model trends, change points, seasonality, calendar effects, outliers and more! Based on data analysis and domain knowledge, you will be able to carefully craft your features.
Then, you will learn how to automate the process of feature engineering to create tons of features for time series forecasting, and subsequently select the ones that are more predictive. Here, we will use open source libraries that allow us to create multiple features automatically or semi-automatically, and then select the most valuable ones. We will cover the Python library Feature-engine, and later on tsfresh and featuretools.
We'll take you step-by-step through engaging video tutorials and teach you everything you need to know to create meaningful features for time series forecasting. Throughout this comprehensive course, we will go through practically every possible methodology for engineering features for time series forecasting. We discuss their logic, Python implementation, advantages and drawbacks, and the things to keep in mind when using these methods.
Specifically, you will learn to:
- identify and isolate the components of a time series, including multi-seasonal time series, using state of the art methods;
- create features that capture trends, change points, and seasonality;
- identify and create suitable lag and window features from the target time series and covariate predictors;
- create features from the date and timestamp itself;
- encode categorical variables for forecasting;
- create features to capture holidays and other special events;
- impute missing data in time series with backward and forward fill and interpolation methods;
- identify, remove, or capture the importance of outliers in forecasting;
- automate feature creation with open source Python libraries.
By the end of the course, you will be able to decide which techniques are best suited for your dataset and forecasting challenge. You will be able to apply all the techniques in Python and discover how to improve your forecasts.
Advance Your Data Science Career
You’ve taken your first steps into data science. You know about the most commonly used forecasting models. You've probably tried some traditional algorithms like ARIMA or exponential smoothing to do your forecasts. At this stage, you’re probably starting to find out that these models make a lot of assumptions about the data that simply do not occur. You thought about trying neural networks, but they provide very complex models for an otherwise simple problem.
You may be wondering whether this is it, or if there are more appropriate, versatile, and simple solutions. You may also wonder whether your code is efficient and performant or if there is a better way to program it. You search online, but you can’t find consolidated resources on feature engineering for forecasting. Maybe just blogs? So you may start thinking: how are things really done in the industry?
In this course, you will find answers to those questions. Throughout the course, you will learn multiple ways to create features for forecasting with traditional regression models and how to implement them elegantly using Python.
You will leverage the power of Python’s open-source ecosystem, including the libraries Pandas, Scipy, Statsmodels, Scikit-learn, and special packages for feature engineering like Feature-engine and Category encoders. Finally, we will show you how you can begin to automate this process with libraries like tsfresh and featuretools.
By the end of the course, you'll be able to combine all of your feature engineering steps into a single streamlined pipeline, allowing you to bring your predictive models into production with maximum efficiency.
Why take this course
There is no single place to go to learn about feature engineering for forecasting. Even after hours of searching on the web, it is hard to find consolidated methods and best practices.
That is why we created this course. This course collates many techniques used worldwide for feature engineering from well-respected forecasting books, data competitions such as Kaggle and KDDscientific articles, and from the instructors’ experience as data scientists. This course is therefore a reference where you can learn about new methods and also revisit them along with their implementation in code; so you can always create the features that you need.
This course is taught by lead data scientists with experience in the use of machine learning in finance, insurance, health, and e-commerce. Sole is also a book author and the lead developer of a Python open source library for feature engineering. Kishan is an experienced forecaster with a PhD in Physics in applied large scale time-series analysis and modelling of cardiac arrhythmias.
This comprehensive feature engineering course contains over 100 lectures spread across approximately 10 hours of video, and ALL topics include hands-on Python code examples that you can use for reference, practice, and reuse in your own projects.
And there is more:
- The course is constantly updated to include new feature engineering methods.
- Notebooks are regularly refreshed to ensure all methods are carried out with the latest releases of the Python libraries, so your code will never break.
- The course combines videos, presentations, and Jupyter notebooks to explain the methods and show their implementation in Python.
- The curriculum was developed over a period of two years with continuous research in the field of forecasting to bring you the latest technologies, tools, and trends.
Want to know more? Read on...
The course comes with a 30-day money-back guarantee, so you can sign up today with no risk.
So what are you waiting for? Enrol today and join the world's most comprehensive course on feature engineering for time series forecasting.
Course Curriculum
- Time series overview (9:48)
- Forecasting overview (7:42)
- Datasets, features and targets (6:55)
- Forecasting framework (6:58)
- Feature engineering overview (9:32)
- Quiz: tabularizing time series data
- Forecasting demo: data analysis (10:12)
- Forecasting demo: feature engineering (14:19)
- Forecasting demo: training the forecaster (6:21)
- Code assignment - tabularize time series (3:46)
- Summary (8:23)
- A word from your instructor (0:26)
- Challenges in feature engineering (4:26)
- Machine learning workflow (4:02)
- Feature engineering in tabular data (9:42)
- Feature engineering in forecasting - considerations (10:05)
- Feature engineering in forecasting - pipelines (4:43)
- Quiz: machine learning workflow
- Forecasting demo - intro (2:22)
- Feature engineering pipeline - demo (9:24)
- Forecasting one step ahead: demo (6:50)
- Code assignment - feature engineering pipeline (3:00)
- Multistep forecasting (4:02)
- Direct forecasting (5:28)
- Direct forecasting: demo (9:58)
- Recursive forecasting (5:41)
- Recursive forecasting: demo (11:47)
- Recursive forecasting: multiple horizons - demo (7:12)
- Summary (3:32)
- A word from your instructor (0:26)
- Extra Treat: Our Reading Suggestion 📕
- Components of a time series (7:10)
- White noise (9:01)
- Additive and multiplicative models (7:13)
- Log transform (5:27)
- Box-Cox transform (11:51)
- Box-Cox transform: Guerrero method (12:07)
- Box-Cox transform: demo (part 1) (8:50)
- Box-Cox transform: demo (part 2) (5:26)
- Moving average (13:59)
- Moving averages in Pandas: demo (7:03)
- Classical decomposition: trend (8:27)
- Classical decomposition: seasonality (9:58)
- Classical decomposition: demo (5:48)
- LOWESS: Theory (11:26)
- LOWESS: Practice (6:42)
- LOWESS to extract trend: demo (12:29)
- LOWESS vs LOESS (6:33)
- STL Overview (13:01)
- STL theory part 1: LOESS and cycle-subseries (8:43)
- STL theory part 2: the inner loop (13:21)
- STL theory part 3: the outer loop (4:30)
- STL to compute seasonality and trend: demo (7:37)
- Multi-seasonal time series (6:40)
- Multi-seasonal decomposition (5:57)
- MSTL (13:12)
- MSTL: demo (11:28)
- Summary (14:09)
- Added Treat: A Movie We Recommend 🍿
- Imputation overview (5:58)
- Forward and backward filling (3:03)
- Forward and backward filling: demo (5:58)
- Linear interpolation (4:36)
- Linear interpolation: demo (5:51)
- Spline interpolation (4:51)
- Spline interpolation: demo (3:43)
- Seasonal decomposition and interpolation (2:32)
- Seasonal decomposition and interpolation: demo (6:42)
- Summary (6:04)
- Outliers overview (9:38)
- Outliers in time series (5:21)
- Rolling statistics (8:27)
- Rolling mean for outlier detection (8:31)
- Rolling mean for outlier detection: demo (10:13)
- Rolling median for outlier detection (7:11)
- Rolling median for outlier detection: demo (9:16)
- Residuals for outlier detection (8:39)
- LOWESS for outlier detection (5:03)
- LOWESS and residuals for outlier detection: demo (10:07)
- STL for outlier detection (3:58)
- STL and residuals for outlier detection: demo (10:02)
- Dummy variables to handle outliers and special events (9:58)
- Summary (8:44)
- More Wisdom: Our Chosen Podcast Episode 🎧
- Lag features (8:41)
- Lag features: demo (5:04)
- How to choose the lags (11:17)
- Autoregressive (AR) processes (15:02)
- Lag plots (12:54)
- Lag plots: demo (6:32)
- Autocorrelation function (part 1) (10:58)
- Autocorrelation function (part 2) (13:51)
- Autocorrelation function: demo (8:57)
- Partial autocorrelation function (part 1) (10:20)
- Partial autocorrelation function (part 2) (13:59)
- Partial autocorrelation function: demo (13:30)
- Cross correlation function (part 1) (5:49)
- Cross correlation function (part 2) (14:28)
- Cross correlation function: demo (14:26)
- Distributed lag features (10:24)
- Creating good lag features demo: air pollution dataset (5:35)
- Creating good lag features demo: domain knowledge (13:50)
- Creating good lag features demo: feature selection & modelling (11:53)
- Creating good lag features demo: correlation methods (part 1) (11:20)
- Creating good lag features demo: correlation methods (part 2) (10:04)
- Summary (10:00)
- Window features overview (2:34)
- Rolling window features: definition (4:50)
- Rolling window features: picking the window size and statistics (8:37)
- Rolling window features: implementation in Python (9:17)
- Rolling window features: demo (12:21)
- Expanding window features: definition (2:30)
- Expanding window features: use cases (3:45)
- Expanding window features: implementation in Python (3:41)
- Expanding window features: demo (4:48)
- Weighted window functions: definition & use cases (11:08)
- Weighted window functions: implementation in Python (5:28)
- Weighted window functions: demo (11:49)
- Exponential weights: definition (5:21)
- Exponential weights: expanding windows and implementation (6:17)
- Exponential weights: demo (12:31)
- Selecting window features: demo (7:43)
- Summary (9:47)
- Trend features: overview (2:44)
- Types of trend (7:07)
- Linear trend: using time as a feature (12:39)
- Time feature: creating the feature demo (9:07)
- Time feature: forecasting demo (13:48)
- Non-linear trend: using time as a feature (10:30)
- Non-linear time features: creating the features demo (4:52)
- Non-linear time features: forecasting demo (6:27)
- Recursive forecasting with lags, windows, and trend (8:10)
- Trend features and recursive forecasting: demo (14:32)
- Piecewise regression and changepoints (part 1) (9:10)
- Piecewise regression and changepoints (part 2) (8:22)
- Changepoint features: creating the features demo (5:31)
- Changepoint features: forecasting demo (9:03)
- Tree-based models and trend (8:54)
- Tree-based models and trend: detrending with sktime demo (13:21)
- Tree-based models and trend: forecasting demo (7:08)
- Linear trees using LightGBM (9:05)
- Linear trees using LightGBM: demo (6:32)
- Summary (6:07)
- Seasonality and cyclical patterns overview (5:24)
- Seasonal lag features (6:38)
- Seasonal lag features: demo (11:26)
- Date and time features for seasonality (3:58)
- Date and time features: demo (part 1) (7:05)
- Date and time features: demo (part 2) (6:51)
- Why linear models struggle with date and time features (4:09)
- Seasonal dummy features (5:49)
- Seasonal dummy features: demo (part 1) (5:35)
- Seasonal dummy features: demo (part 2) (5:50)
- Fourier features: theory (7:41)
- Fourier features: how to create and use Fourier features (10:22)
- Fourier features: demo (part 1) (10:52)
- Fourier features: demo (part 2) (4:24)
- Summary (3:33)
- Categorical features - intro (3:13)
- One hot encoding (6:03)
- One hot encoding with open source (3:58)
- One hot encoding: demo (7:51)
- Ordinal encoding (1:49)
- Ordinal encoding with open source (1:45)
- Ordinal encoding: demo (3:34)
- Mean encoding (8:53)
- Mean encoding: demo with Feature-engine (3:18)
- Mean encoding: demo with expanding windows (5:03)
- Summary (5:34)
Frequently Asked Questions
When does the course begin and end?
You can start taking the course from the moment you enroll. The course is self-paced, so you can watch the tutorials and apply what you learn whenever you find it most convenient.
For how long can I access the course?
The course has lifetime access. This means that once you enroll, you will have unlimited access to the course for as long as you like.
What if I don't like the course?
There is a 30-day money back guarantee. If you don't find the course useful, contact us within the first 30 days of purchase and you will get a full refund.
Will I get a certificate?
Yes, you'll get a certificate of completion after completing all lectures, quizzes and assignments.