Time-series forecasting is the process of analyzing historical time-ordered data to forecast future data points or events. Time-series forecasting is commonly used in finance, supply chain management, business, and sales.
It’s used to solve problems that range from forecasting a company's sales for the next quarter to predicting the quickly-moving price of assets such as stocks.
Time series forecasting uses what’s called “time-series data”. Time-series data is recorded over specific intervals of time and usually consists of temporal patterns like seasonality (a repeating pattern in the data) and trend (the general direction of the data like a time-series that is exponentially or linearly decreasing).
Visualization is a great way to understand time-series data, and line charts usually do the best job. The graph below shows an example time series of US retail sales over the years. In this graph, we can see:
- Seasonality - The regular spikes that look like a heartbeat indicate seasonality.
- An increasing trend- The general upwards direction of the data is an increasing trend.
Time series forecasting is about making future predictions on historic time data. Forecasting models analyze the temporal dependencies in the data to make predictions. You must have chronologically-ordered data and a time-related problem to solve, depending on the volume, frequency, type, and nature (like seasonality). In most cases, auto-regressive and machine-learning-based techniques (like XGBoost or RNNs) are the algorithms data scientists use for time-series forecasting.
In this article, we provide an intro to Python for multivariate time series forecasting.
Multivariate time series forecasting key concepts
Key questions for choosing an algorithm
Before we dive into the specifics of Python for multivariate time series forecasting, let’s explore how to choose the right algorithm for the job. Different forecasting algorithms are used for different use cases, and for sequences of different data types and natures.
There are a wide variety of algorithms you can use for forecasting time data, ranging from simple line equations to very complex neural networks, each algorithm has its own advantages and disadvantages.
To pick the right algorithm for your forecasting problem, you have to answer a few key questions, such as:
- What kind of time series are we dealing with? It could be a business time series where, for example, you’re forecasting sales. Or, it could be hardware time series, where you forecast voltage values for a machine part 15 minutes into the future. Different algorithms are designed for different kinds of time-series.
- What are the intervals for the time series? Data can come in a wide range of intervals like one millisecond or one month. The interval frequency directly affects data volume and some algorithms can not handle huge amounts of data well.
- How accurate do you need your model to be? The right algorithm may vary depending on your accuracy requirements. For example, you might be willing to trade speed and simplicity for accuracy in one use case, but not another.
- How fast do you need your model to be? Is your model going to be deployed in a live environment where speed is critical, or will it be used only once a quarter?
- How much data is there? Data size directly impacts decisions on the right algorithm for the job.
- What shape is your data? Is it a univariate time series or a multivariate time series? We’ll see why this is important very shortly below.
Python and real-world multivariate datasets
Real-world datasets are often complex and require cleaning, preprocessing, and exploring. They almost always come with multiple complex variables. The sections below will review different techniques for multivariate time-series forecasting of these complex data sets.
Multivariate time series forecasting Python
Multivariate time-series data has multiple time-ordered and time-dependent variables and are commonly found in time-series forecasting problems, such as data from multiple health-monitoring sensors.
TBATS is a time-series forecasting algorithm that uses exponential smoothing and box-cox transformation to deal with data that has a complex format of multiple seasonalities.
Vector autoregression Python
Vector autoregression is a time-series forecasting algorithm that is often used when you have multiple time series that affect each other. It’s a linear combination of the values in the different time series variables.
XGBoost predict probability
XGBoost is a very commonly used and powerful machine learning model that uses boosted decision trees to make predictions. With XGBoost, you can estimate the probability of those predictions using methods such as Isotonic Regression.
Python residual sum of squares
The residual sum of squares (RSS) is a metric used to measure the distance between a regression model’s predictions and the ground truth variables and is often used in time-series forecasting.
Python moving average numpy
Moving average is a mathematical method that is common in stock price analysis and prediction, which is a form of time-series forecasting and is used to smooth out the price of the asset over time.
This article presented a perspective and a brief overview of what time series forecasting is, why and where it is used, how companies make good use of it, and what algorithms are used in the field. The 6 chapters above explore the world of time series forecasting in detail and are a great starting point to dive deep into the domain.
- Chapter 1: Vector Autoregression with Python: Learn to apply vector auto-regression (VAR to seasonal datasets using Python, as well as the theory behind how VAR works.
- Chapter 2: TBATS Python: Tutorial & Examples: Learn to do time series forecasting in Python using the TBATS model and follow examples to master its usage.
- Chapter 3: Using XGBoost to predict probability: XGBoost is a machine learning algorithm that helps you predict the class labels of data. Learn to leverage this powerful tool using Python.
- Chapter 4: Python residual sum of squares: Tutorial & Examples: Learn to calculate the residual sum of squares using practical, hands-on examples and Python code snippets.
- Chapter 5: Python Moving Average Numpy: Tutorial & Examples: Learn how to calculate the moving average using Python’s NumPy library and follow examples to master its usage.
More chapters are coming soon!