# INFOMDA2

Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.

# 1 Introduction

Throughout this practical you will make use of of the R package fpp3. Furthermore, we use the dataset `ukcars` that is included in the `expsmooth` package. If you haven’t already, install these packages:

``````library(expsmooth)
library(fpp3)
library(fable.prophet)
``````

# 2 Take-home exercises

## 2.1 Data exploration

1. Look at the data `ukcars`, and describe the structure of this data file. What type of object is `ukcars`?

2. Before anything else, we will convert this object to a “time series tibble” or `tsibble` object called `ts_cars`. Use the function `as_tsibble()` for this. Describe what is different about `ts_cars` relative to `ukcars`.

Before analyzing the data, we consider a few visualizations of the data.

3. First, create a line plot of the data. You can do this yourself (by mapping aesthetics and specifying geoms) or you can use the function `autoplot()` for this. Are there any patterns visible in these data?

4. Create a line plot for the period between 1980 and 2000. This can be done by first filtering the data based on `year(index)` and then passing the result to the `autoplot()` function.

5. A second useful way to visualize the data is by plotting the autocorrelation function. You can use the function `ACF()` to compute the autocorrelation function and then use `autoplot()` on this object to plot the ACF. Are there specific features to notice about the ACF of these data?

# 3 Lab exercises

## 3.1 Decomposition

6. Create an STL decomposition for the whole `ts_cars` data (trend, season, remainder) using a window size of 15 for the trend. Then, extract the components using the `components()` function and show the first few rows of the resulting `tsibble`.

Hint: look at the example in the help file of the `STL` function to see how to estimate models using the `fable` workflow.

7. Use the `autoplot` function to plot the individual components.

8. Use `ACF` and the `autoplot` functions to plot the autocorrelation of the remainder. According to the Box-Jenkins methodology, can this model be improved?

## 3.2 ARIMA modeling

9. Create two ARIMA models for this data using the following syntax. Using the help files, explain the first ARIMA model in your own words. Then briefly explain how the second model came about.

``````models <-
ts_cars %>%
model(
ARIMA = ARIMA(value ~ pdq(1, 1, 1) + PDQ(0, 0, 0)),
SARIMA = ARIMA(value)
)
``````

10. Create forecasts for these two models using the `forecast()` function. Use a horizon of 5 years. Plot these forecasts using the `autoplot()` function. Explain the main similarities and differences between the two forecasts.

## 3.3 Comparing SARIMA and Prophet forecasting methods

The `prophet` model (implemented in the `fable.prophet` package) is a model developed at Facebook to create fast and flexible forecasts for many types of time series. For more info, see this section of fpp3.

11. Fit a data-driven seasonal ARIMA and a prophet model for the `ts_cars` datasets between 1980 and 1995. Then, create forecasts until the year 2000, and compare the model forecasts to the observed data for this period. Which model works better in this setting?