Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.

1 Introduction

Throughout this practical you will make use of of the R package fpp3. Furthermore, we use the dataset ukcars that is included in the expsmooth package. If you haven’t already, install these packages:


2 Take-home exercises

2.1 Data exploration

1. Look at the data ukcars, and describe the structure of this data file. What type of object is ukcars?

2. Before anything else, we will convert this object to a “time series tibble” or tsibble object called ts_cars. Use the function as_tsibble() for this. Describe what is different about ts_cars relative to ukcars.

Before analyzing the data, we consider a few visualizations of the data.

3. First, create a line plot of the data. You can do this yourself (by mapping aesthetics and specifying geoms) or you can use the function autoplot() for this. Are there any patterns visible in these data?

4. Create a line plot for the period between 1980 and 2000. This can be done by first filtering the data based on year(index) and then passing the result to the autoplot() function.

5. A second useful way to visualize the data is by plotting the autocorrelation function. You can use the function ACF() to compute the autocorrelation function and then use autoplot() on this object to plot the ACF. Are there specific features to notice about the ACF of these data?

3 Lab exercises

3.1 Decomposition

6. Create an STL decomposition for the whole ts_cars data (trend, season, remainder) using a window size of 15 for the trend. Then, extract the components using the components() function and show the first few rows of the resulting tsibble.

Hint: look at the example in the help file of the STL function to see how to estimate models using the fable workflow.

7. Use the autoplot function to plot the individual components.

8. Use ACF and the autoplot functions to plot the autocorrelation of the remainder. According to the Box-Jenkins methodology, can this model be improved?

3.2 ARIMA modeling

9. Create two ARIMA models for this data using the following syntax. Using the help files, explain the first ARIMA model in your own words. Then briefly explain how the second model came about.

models <- 
  ts_cars %>%
    ARIMA = ARIMA(value ~ pdq(1, 1, 1) + PDQ(0, 0, 0)),
    SARIMA = ARIMA(value)

10. Create forecasts for these two models using the forecast() function. Use a horizon of 5 years. Plot these forecasts using the autoplot() function. Explain the main similarities and differences between the two forecasts.

3.3 Comparing SARIMA and Prophet forecasting methods

The prophet model (implemented in the fable.prophet package) is a model developed at Facebook to create fast and flexible forecasts for many types of time series. For more info, see this section of fpp3.

11. Fit a data-driven seasonal ARIMA and a prophet model for the ts_cars datasets between 1980 and 1995. Then, create forecasts until the year 2000, and compare the model forecasts to the observed data for this period. Which model works better in this setting?