Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.
Throughout this practical you will make use of of the R package fpp3.
Furthermore, we use the dataset
ukcars that is included in the
expsmooth package. If you haven’t already, install these packages:
library(expsmooth) library(fpp3) library(fable.prophet)
1. Look at the data
ukcars, and describe the structure of this data
file. What type of object is
2. Before anything else, we will convert this object to a “time series
tsibble object called
ts_cars. Use the function
as_tsibble() for this. Describe what is different about
Before analyzing the data, we consider a few visualizations of the data.
3. First, create a line plot of the data. You can do this yourself (by
mapping aesthetics and specifying geoms) or you can use the function
autoplot() for this. Are there any patterns visible in these data?
4. Create a line plot for the period between 1980 and 2000. This can
be done by first filtering the data based on
year(index) and then
passing the result to the
5. A second useful way to visualize the data is by plotting the
autocorrelation function. You can use the function
ACF() to compute
the autocorrelation function and then use
autoplot() on this object to
plot the ACF. Are there specific features to notice about the ACF of
6. Create an STL decomposition for the whole
ts_cars data (trend,
season, remainder) using a window size of 15 for the trend. Then,
extract the components using the
components() function and show the
first few rows of the resulting
Hint: look at the example in the help file of the
STL function to see
how to estimate models using the
7. Use the
autoplot function to plot the individual components.
ACF and the
autoplot functions to plot the autocorrelation
of the remainder. According to the Box-Jenkins methodology, can this
model be improved?
9. Create two ARIMA models for this data using the following syntax. Using the help files, explain the first ARIMA model in your own words. Then briefly explain how the second model came about.
models <- ts_cars %>% model( ARIMA = ARIMA(value ~ pdq(1, 1, 1) + PDQ(0, 0, 0)), SARIMA = ARIMA(value) )
10. Create forecasts for these two models using the
function. Use a horizon of 5 years. Plot these forecasts using the
autoplot() function. Explain the main similarities and differences
between the two forecasts.
prophet model (implemented in the
fable.prophet package) is a
model developed at Facebook to create fast and flexible forecasts for
many types of time series. For more info, see this section of
11. Fit a data-driven seasonal ARIMA and a prophet model for the
ts_cars datasets between 1980 and 1995. Then, create forecasts until
the year 2000, and compare the model forecasts to the observed data for
this period. Which model works better in this setting?