In this practical, we will apply model-based clustering on a data set of bank note measurements.

We use the following packages:


The data is built into the mclust package and can be loaded as a tibble by running the following code:

df <- as_tibble(banknote)

Take-home exercises

Data exploration

1. Read the help file of the banknote data set to understand what it’s all about.


2. Create a scatter plot of the left (x-axis) and right (y-axis) measurements on the data set. Map the Status column to colour. Jitter the points to avoid overplotting. Are the classes easy to distinguish based on these features?

df %>% 
  ggplot(aes(x = Left, y = Right, colour = Status)) +
  geom_point(position = position_jitter())