Introduction

Throughout this practical you will make use of of the R package fpp3. Furthermore, we use the dataset ukcars that is included in the expsmooth package. If you haven’t already, install these packages:

library(expsmooth)
library(fpp3)
library(fable.prophet)

Take-home exercises

Data exploration

1. Look at the data ukcars, and describe the structure of this data file. What type of object is ukcars?

ukcars
##         Qtr1    Qtr2    Qtr3    Qtr4
## 1977 330.371 371.051 270.670 343.880
## 1978 358.491 362.822 261.281 240.355
## 1979 325.382 316.700 171.153 257.217
## 1980 298.127 251.464 181.555 192.598
## 1981 245.652 245.526 225.261 238.211
## 1982 257.385 228.461 175.371 226.462
## 1983 266.150 287.251 225.883 265.313
## 1984 272.759 234.134 196.462 205.551
## 1985 291.283 284.422 221.571 250.697
## 1986 253.757 267.016 220.388 277.801
## 1987 283.233 302.072 259.720 297.658
## 1988 306.129 322.106 256.723 341.877
## 1989 356.004 361.540 270.433 311.105
## 1990 326.688 327.059 274.257 367.606
## 1991 346.163 348.211 250.008 292.518
## 1992 343.318 343.429 275.386 329.747
## 1993 364.521 378.448 300.798 331.757
## 1994 362.536 389.133 323.322 391.832
## 1995 421.646 416.823 311.713 381.902
## 1996 422.982 427.722 376.850 458.580
## 1997 436.225 441.487 369.566 450.723
## 1998 462.442 468.232 403.636 413.948
## 1999 460.496 448.932 407.787 469.408
## 2000 494.311 433.240 335.106 378.795
## 2001 387.100 372.395 335.790 397.080
## 2002 449.755 402.252 391.847 385.890
## 2003 424.325 433.280 391.213 408.740
## 2004 445.458 428.202 379.048 394.042
## 2005 432.796
# Rows are years, columns are quarters (seasons).

str(ukcars)
##  Time-Series [1:113] from 1977 to 2005: 330 371 271 344 358 ...
# This is a time series object: For these objects the correct time 
# order of the data points (including the different quarters in the
# columns) is included in the object.

2. Before anything else, we will convert this object to a “time series tibble” or tsibble object called ts_cars. Use the function as_tsibble() for this. Describe what is different about ts_cars relative to ukcars.

ts_cars <- as_tsibble(ukcars)
head(ts_cars)
# ts_cars now has only 2 columns, an index and a value
# this format is more amenable to multivariate time 
# series, for example.
# ts_cars also has identified nicely that this is
# quarterly data!

# The function `as_tsibble()` can be used directly on time series 
# objects such as `ukcars` to create a valid tsibble. For other types 
# of objects you may need to specify more options to create a tsibble
# (or you can try making it into a time series with ts() first). 

Before analyzing the data, we consider a few visualizations of the data.

3. First, create a line plot of the data. You can do this yourself (by mapping aesthetics and specifying geoms) or you can use the function autoplot() for this. Are there any patterns visible in these data?

autoplot(ts_cars, value)

# There are clear trends over time:
# there is a decrease at the beginning,
# followed by an increase, then a sudden dip,
# followed by a partial recovery.
# At a shorter timescale, there also seems
# a seasonal pattern (i.e., an annual period).
# These data are not stationary.

4. Create a line plot for the period between 1980 and 2000. This can be done by first filtering the data based on year(index) and then passing the result to the autoplot() function.

ts_cars %>% 
  filter(year(index) > 1980, year(index) < 2000) %>% 
  autoplot(value)

5. A second useful way to visualize the data is by plotting the autocorrelation function. You can use the function ACF() to compute the autocorrelation function and then use autoplot() on this object to plot the ACF. Are there specific features to notice about the ACF of these data?

ACF(ts_cars, y = value) %>% 
  autoplot()

# The autocorrelations are quite high and stay like that
# for a long time (i.e., over larger lags as well). 
# This matches the fact that there are long-term
# trends visible in the data (see sequence plot we made 
# above).
# Furthermore, there is a peak at lags 4, 8, 12, etc., which
# is consistent with the idea that these data contain a 
# seasonal effect.

Lab exercises

Decomposition

6. Create an STL decomposition for the whole ts_cars data (trend, season, remainder) using a window size of 15 for the trend. Then, extract the components using the components() function and show the first few rows of the resulting tsibble.

Hint: look at the example in the help file of the STL function to see how to estimate models using the fable workflow.

stl_decomp <- 
  ts_cars %>% 
  model(STL(value ~ trend(window = 15))) %>% 
  components() 

head(stl_decomp)

7. Use the autoplot function to plot the individual components.

autoplot(stl_decomp)

8. Use ACF and the autoplot functions to plot the autocorrelation of the remainder. According to the Box-Jenkins methodology, can this model be improved?

stl_decomp %>% ACF(remainder) %>% autoplot()

# there is quite strong autocorrelation left, even some cyclic behaviour

ARIMA modeling

9. Create two ARIMA models for this data using the following syntax. Using the help files, explain the first ARIMA model in your own words. Then briefly explain how the second model came about.

models <- 
  ts_cars %>%
  model(
    ARIMA = ARIMA(value ~ pdq(1, 1, 1) + PDQ(0, 0, 0)),
    SARIMA = ARIMA(value)
  )
models
# The first model is a non-seasonal ARIMA(1, 1, 1) model, meaning a single AR, first difference, and a single MA parameter.
# The second model is a fully data-driven SARIMA model, chosen based on the AICc by the fable package.

10. Create forecasts for these two models using the forecast() function. Use a horizon of 5 years. Plot these forecasts using the autoplot() function. Explain the main similarities and differences between the two forecasts.

models %>%
  forecast(h = "5 years") %>%
  autoplot(ts_cars) +
  facet_grid(rows = vars(.model))

# Both models don't show a strong trend. the SARIMA model neatly shows the 
# seasonality in the data, the ARIMA model does not. The confidence bands
# seem approximately equally wide and are appropriately increasing in size

Comparing SARIMA and Prophet forecasting methods

The prophet model (implemented in the fable.prophet package) is a model developed at Facebook to create fast and flexible forecasts for many types of time series. For more info, see this section of fpp3.

11. Fit a data-driven seasonal ARIMA and a prophet model for the ts_cars datasets between 1980 and 1995. Then, create forecasts until the year 2000, and compare the model forecasts to the observed data for this period. Which model works better in this setting?

fit <- ts_cars %>% 
  filter(year(index) > 1980, year(index) < 1995) %>% 
  model(
    SARIMA = ARIMA(value),
    Prophet = prophet(value)
  ) 

fit %>% 
  forecast(h = "5 years") %>% 
  autoplot(ts_cars %>% filter(year(index) > 1980, year(index) < 2000)) +
  facet_grid(rows = vars(.model))

# Both models slightly underestimate the `value`. We could argue that the
# SARIMA model works better in this case because the confidence bands include
# the real data, whereas this is not the case for the prophet model.
# The prophet model is a bit "overconfident".