Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.
Throughout this practical you will make use of of the R package fpp3.
Furthermore, we use the dataset ukcars
that is included in the
expsmooth
package. If you haven’t already, install these packages:
library(expsmooth)
library(fpp3)
library(fable.prophet)
1. Look at the data ukcars
, and describe the structure of this data
file. What type of object is ukcars
?
2. Before anything else, we will convert this object to a “time series
tibble” or tsibble
object called ts_cars
. Use the function
as_tsibble()
for this. Describe what is different about ts_cars
relative to ukcars
.
Before analyzing the data, we consider a few visualizations of the data.
3. First, create a line plot of the data. You can do this yourself (by
mapping aesthetics and specifying geoms) or you can use the function
autoplot()
for this. Are there any patterns visible in these data?
4. Create a line plot for the period between 1980 and 2000. This can
be done by first filtering the data based on year(index)
and then
passing the result to the autoplot()
function.
5. A second useful way to visualize the data is by plotting the
autocorrelation function. You can use the function ACF()
to compute
the autocorrelation function and then use autoplot()
on this object to
plot the ACF. Are there specific features to notice about the ACF of
these data?
6. Create an STL decomposition for the whole ts_cars
data (trend,
season, remainder) using a window size of 15 for the trend. Then,
extract the components using the components()
function and show the
first few rows of the resulting tsibble
.
Hint: look at the example in the help file of the STL
function to see
how to estimate models using the fable
workflow.
7. Use the autoplot
function to plot the individual components.
8. Use ACF
and the autoplot
functions to plot the autocorrelation
of the remainder. According to the Box-Jenkins methodology, can this
model be improved?
9. Create two ARIMA models for this data using the following syntax. Using the help files, explain the first ARIMA model in your own words. Then briefly explain how the second model came about.
models <-
ts_cars %>%
model(
ARIMA = ARIMA(value ~ pdq(1, 1, 1) + PDQ(0, 0, 0)),
SARIMA = ARIMA(value)
)
10. Create forecasts for these two models using the forecast()
function. Use a horizon of 5 years. Plot these forecasts using the
autoplot()
function. Explain the main similarities and differences
between the two forecasts.
The prophet
model (implemented in the fable.prophet
package) is a
model developed at Facebook to create fast and flexible forecasts for
many types of time series. For more info, see this section of
fpp3.
11. Fit a data-driven seasonal ARIMA and a prophet model for the
ts_cars
datasets between 1980 and 1995. Then, create forecasts until
the year 2000, and compare the model forecasts to the observed data for
this period. Which model works better in this setting?