1 Introduction
2 Take-home exercises: deep feed-forward neural network
3 Lab exercises: convolutional neural network

1 Introduction

In this practical, we will create a feed-forward neural network as well as a convolutional neural network to analyze the famous MNIST dataset.

library(tidyverse)
library(keras)

2 Take-home exercises: deep feed-forward neural network

In this section, we will develop a deep feed-forward neural network for MNIST.

2.1 Data preparation

1. Load the built-in MNIST dataset by running the following code. Then, describe the structure and contents of the mnist object.

mnist <- dataset_mnist()

Plotting is very important when working with image data. We have defined a convenient plotting function for you.

2. Use the plot_img() function below to plot the first training image. The img parameter has to be a matrix with dimensions (28, 28). NB: indexing in 3-dimensional arrays works the same as indexing in matrices, but you need an extra comma x[,,].

plot_img <- function(img, col = gray.colors(255, start = 1, end = 0), ...) {
  image(t(img), asp = 1, ylim = c(1.1, -0.1), col = col, bty = "n", axes = FALSE, ...)
}

It is usually a good idea to normalize your features to have a manageable, standard range before entering them in neural networks.

3. As a preprocessing step, ensure the brightness values of the images in the training and test set are in the range (0, 1)

2.2 Multinomial logistic regression

The simplest model is a multinomial logistic regression model, where we have no hidden layers and 10 outputs (0-1). That model is shown below.

4. Display a summary of the multinomial model using the summary() function. Describe why this model has 7850 parameters.

multinom  <- 
  keras_model_sequential(input_shape = c(28, 28)) %>% # initialize a sequential model
  layer_flatten() %>% # flatten 28*28 matrix into single vector
  layer_dense(10, activation = "softmax") # softmax outcome == logistic regression for each of 10 outputs

multinom$compile(
  loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
  optimizer = "adam", # we use this optimizer because it works well
  metrics = list("accuracy") # we want to know training accuracy in the end
)

5. Train the model for 5 epochs using the code below. What accuracy do we obtain in the validation set? (NB: the multinom object is changed “in-place”, which means you don’t have to assign it to another variable)

multinom %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 5, validation_split = 0.2, verbose = 1)

6. Train the model for another 5 epochs. What accuracy do we obtain in the validation set?

2.3 Deep feed-forward neural networks.

7. Create and compile a feed-forward neural network with the following properties. Ensure that the model has 50890 parameters.

sequential model
flatten layer
dense layer with 64 hidden units and “relu” activation function
dense output layer with 10 units and softmax activation function

You may reuse code from the multinomial model

7. Train the model for 10 epochs. What do you see in terms of validation accuracy, also compared to the multinomial model?

8. Create predictions for the test data using the two trained models (using the function below). Create a confusion matrix and compute test accuracy for these two models. Write down any observations you have.

class_predict <- function(model, x_train) predict(model, x = x_train) %>% apply(1, which.max) - 1

9. OPTIONAL: if you have time, create and estimate (10 epochs) a deep feed-forward model with the following properties. Compare this model to the previous models on the test data.

sequential model
flatten layer
dense layer with 128 hidden units and “relu” activation function
dense layer with 64 hidden units and “relu” activation function
dense output layer with 10 units and softmax activation function

3 Lab exercises: convolutional neural network

Convolution layers in Keras need a specific form of data input. For each example, they need a (width, height, channels) array (tensor). For a colour image with 28*28 dimension, that shape is usually (28, 28, 3), where the channels indicate red, green, and blue. MNIST has no colour info, but we still need the channel dimension to enter the data into a convolution layer with shape (28, 28, 1). The training dataset x_train should thus have shape (60000, 28, 28, 1).

10. add a “channel” dimension to the training and test data using the following code. Plot an image using the first channel of the 314th training example (this is a 9).

# add channel dimension to input (required for convolution layers)
dim(mnist$train$x) <- c(dim(mnist$train$x), 1)
dim(mnist$test$x)  <- c(dim(mnist$test$x), 1)

11. Create and compile a convolutional neural network using the following code. Describe the different layers in your own words.

cnn <- 
  keras_model_sequential(input_shape = c(28, 28, 1)) %>% 
  layer_conv_2d(filters = 6, kernel_size = c(5, 5)) %>% 
  layer_max_pooling_2d(pool_size = c(4, 4)) %>%
  layer_flatten() %>% 
  layer_dense(units = 32, activation = "relu") %>% 
  layer_dense(10, activation = "softmax")

cnn %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = "adam", 
    metrics = c("accuracy")
  )

12. Fit this model on the training data (10 epochs) and compare it to the previous models.

13. Create another CNN which has better validation performance within 10 epochs. Compare your validation accuracy to that of your peers.

Here are some things you could do:

Reduce the convolution filter size & the pooling size and add a second convolutional & pooling layer with double the number of filters
Add a dropout layer after the flatten layer
Look up on the internet what works well and implement it!