Throughout this practical you will make use of of the R package fpp3.
Furthermore, we use the dataset ukcars
that is included in
the expsmooth
package. If you haven’t already, install
these packages:
library(expsmooth)
library(fpp3)
library(fable.prophet)
1. Look at the data ukcars
, and describe the
structure of this data file. What type of object is
ukcars
?
ukcars
## Qtr1 Qtr2 Qtr3 Qtr4
## 1977 330.371 371.051 270.670 343.880
## 1978 358.491 362.822 261.281 240.355
## 1979 325.382 316.700 171.153 257.217
## 1980 298.127 251.464 181.555 192.598
## 1981 245.652 245.526 225.261 238.211
## 1982 257.385 228.461 175.371 226.462
## 1983 266.150 287.251 225.883 265.313
## 1984 272.759 234.134 196.462 205.551
## 1985 291.283 284.422 221.571 250.697
## 1986 253.757 267.016 220.388 277.801
## 1987 283.233 302.072 259.720 297.658
## 1988 306.129 322.106 256.723 341.877
## 1989 356.004 361.540 270.433 311.105
## 1990 326.688 327.059 274.257 367.606
## 1991 346.163 348.211 250.008 292.518
## 1992 343.318 343.429 275.386 329.747
## 1993 364.521 378.448 300.798 331.757
## 1994 362.536 389.133 323.322 391.832
## 1995 421.646 416.823 311.713 381.902
## 1996 422.982 427.722 376.850 458.580
## 1997 436.225 441.487 369.566 450.723
## 1998 462.442 468.232 403.636 413.948
## 1999 460.496 448.932 407.787 469.408
## 2000 494.311 433.240 335.106 378.795
## 2001 387.100 372.395 335.790 397.080
## 2002 449.755 402.252 391.847 385.890
## 2003 424.325 433.280 391.213 408.740
## 2004 445.458 428.202 379.048 394.042
## 2005 432.796
# Rows are years, columns are quarters (seasons).
str(ukcars)
## Time-Series [1:113] from 1977 to 2005: 330 371 271 344 358 ...
# This is a time series object: For these objects the correct time
# order of the data points (including the different quarters in the
# columns) is included in the object.
2. Before anything else, we will convert this object to a
“time series tibble” or tsibble
object called
ts_cars
. Use the function as_tsibble()
for
this. Describe what is different about ts_cars
relative to
ukcars
.
<- as_tsibble(ukcars)
ts_cars head(ts_cars)
# ts_cars now has only 2 columns, an index and a value
# this format is more amenable to multivariate time
# series, for example.
# ts_cars also has identified nicely that this is
# quarterly data!
# The function `as_tsibble()` can be used directly on time series
# objects such as `ukcars` to create a valid tsibble. For other types
# of objects you may need to specify more options to create a tsibble
# (or you can try making it into a time series with ts() first).
Before analyzing the data, we consider a few visualizations of the data.
3. First, create a line plot of the data. You can do this
yourself (by mapping aesthetics and specifying geoms) or you can use the
function autoplot()
for this. Are there any patterns
visible in these data?
autoplot(ts_cars, value)
# There are clear trends over time:
# there is a decrease at the beginning,
# followed by an increase, then a sudden dip,
# followed by a partial recovery.
# At a shorter timescale, there also seems
# a seasonal pattern (i.e., an annual period).
# These data are not stationary.
4. Create a line plot for the period between 1980 and 2000.
This can be done by first filtering the data based on
year(index)
and then passing the result to the
autoplot()
function.
%>%
ts_cars filter(year(index) > 1980, year(index) < 2000) %>%
autoplot(value)
5. A second useful way to visualize the data is by plotting
the autocorrelation function. You can use the function
ACF()
to compute the autocorrelation function and then use
autoplot()
on this object to plot the ACF. Are there
specific features to notice about the ACF of these data?
ACF(ts_cars, y = value) %>%
autoplot()
# The autocorrelations are quite high and stay like that
# for a long time (i.e., over larger lags as well).
# This matches the fact that there are long-term
# trends visible in the data (see sequence plot we made
# above).
# Furthermore, there is a peak at lags 4, 8, 12, etc., which
# is consistent with the idea that these data contain a
# seasonal effect.
6. Create an STL decomposition for the whole
ts_cars
data (trend, season, remainder) using a window size
of 15 for the trend. Then, extract the components using the
components()
function and show the first few rows of the
resulting tsibble
.
Hint: look at the example in the help file of the STL
function to see how to estimate models using the fable
workflow.
<-
stl_decomp %>%
ts_cars model(STL(value ~ trend(window = 15))) %>%
components()
head(stl_decomp)
7. Use the autoplot
function to plot the
individual components.
autoplot(stl_decomp)
8. Use ACF
and the autoplot
functions to plot the autocorrelation of the remainder. According to the
Box-Jenkins methodology, can this model be improved?
%>% ACF(remainder) %>% autoplot() stl_decomp
# there is quite strong autocorrelation left, even some cyclic behaviour
9. Create two ARIMA models for this data using the following syntax. Using the help files, explain the first ARIMA model in your own words. Then briefly explain how the second model came about.
<-
models %>%
ts_cars model(
ARIMA = ARIMA(value ~ pdq(1, 1, 1) + PDQ(0, 0, 0)),
SARIMA = ARIMA(value)
)
models
# The first model is a non-seasonal ARIMA(1, 1, 1) model, meaning a single AR, first difference, and a single MA parameter.
# The second model is a fully data-driven SARIMA model, chosen based on the AICc by the fable package.
10. Create forecasts for these two models using the
forecast()
function. Use a horizon of 5 years. Plot these
forecasts using the autoplot()
function. Explain the main
similarities and differences between the two forecasts.
%>%
models forecast(h = "5 years") %>%
autoplot(ts_cars) +
facet_grid(rows = vars(.model))
# Both models don't show a strong trend. the SARIMA model neatly shows the
# seasonality in the data, the ARIMA model does not. The confidence bands
# seem approximately equally wide and are appropriately increasing in size
The prophet
model (implemented in the
fable.prophet
package) is a model developed at Facebook to
create fast and flexible forecasts for many types of time series. For
more info, see this
section of fpp3.
11. Fit a data-driven seasonal ARIMA and a prophet model for
the ts_cars
datasets between 1980 and 1995. Then, create
forecasts until the year 2000, and compare the model forecasts to the
observed data for this period. Which model works better in this
setting?
<- ts_cars %>%
fit filter(year(index) > 1980, year(index) < 1995) %>%
model(
SARIMA = ARIMA(value),
Prophet = prophet(value)
)
%>%
fit forecast(h = "5 years") %>%
autoplot(ts_cars %>% filter(year(index) > 1980, year(index) < 2000)) +
facet_grid(rows = vars(.model))
# Both models slightly underestimate the `value`. We could argue that the
# SARIMA model works better in this case because the confidence bands include
# the real data, whereas this is not the case for the prophet model.
# The prophet model is a bit "overconfident".